Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Time Series Classification with Deep Learning
May 5, 2020
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Motivation
During the last years, Time Series Classification (TSC) has become one of the
most challenging problems in data mining
Many classification problems can be treated as a Time Series Classification
problems
Time series are present in many real-world applications:
health care,
human activity recognition,
cyber-security,
finance.
Many areas are strongly increasing their interest in applications based on time
series
Non Deep Learning algorithms require some kind of feature engineering before
the classification
Deep Learning algorithms already incorporate this kind of feature engineering
internally
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Examples of Time Series Classification Problems
Electrocardiogram analysis
Electrocardiogram records are saved in time series form
Distinguishing a disease is a TSC problem
Gesture recognition
Many devices record series of images to interpret the user’s gestures
Identifying the correct gesture is a TSC problem
Anomaly detection
Anomaly detection is the identification of unusual events
Often the data in anomaly detection are time series
Distinguishing and recognize an anomaly is a TSC problem
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Problem definition
Given a set of objects with the same structure and a fixed set of different classes,
a dataset is a collection of pairs (object, class)
Given a dataset, the goal of a Classification algorithm is to build a model that
associates to an object the probability to belong to the possible classes,
accordingly to the features of the objects associated to each class
Univariate time series: ordered set of real values
M-dimensional multivariate time series: M different univariate time series with
the same length
Time Series Classification problem: Classification problem where the objects of
the dataset are univariate or multivariate time series
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Perceptron (Neuron)
The Perceptron (Neuron) is the basic element of many machine learning
algorithms
The goal of a Perceptron is to compute the wighted sum of the input values and
then apply an activation function to the result
Most common activation functions:
The result of the activation function is referred as the activation of the
Perceptron and represents its output value
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Multi Layer Perceptron Architecture
A Multi Layer Perceptron (MLP) is a class of feedforward neural networks, with
one Input Layer, one or more Hidden Layers, and one Output Layer
Multi Layer Perceptron is fully connected
Each node of the hidden layers and of the output layer is a Perceptron
The output of the Multi Layer Perceptron is obtained computing in sequence the
activation of its Perceptrons
The function that connect the input and the output depends on the values of the
weights.
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Classification with Multi Layer Perceptron
Multi Layer Perceptron is commonly used for Classification problems
It’s necessary to represent the pairs (object, class) in the dataset in a more
suitable way:
Every object must be represented with a vector, called input vector
Every class must be represented with its one-hot label vector, called target
For training, MLP uses Backpropagation technique that iterates on the input
vectors
Iteration steps:
Computation of the output for the current input vector
Computation of the prediction error with a cost function
Upgrade of the weights with gradient descent
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Classification with Multi Layer Perceptron
The Backpropagation minimizes the loss on the training data
After the training the model is able to predict the estimated probabilities of an
object to belong to each class
Why don’t use the MLP for TSC, taking the whole multivariate time series as
input?
MLP don’t work well for TSC problems because the length of the time series really
hurts the computational speed
It’s necessary to extract the relevant features of the input time series
The big advantage of Deep Learning algorithms is that these relevant feature are
learned during the training
After many layers used for the extraction of the relevant features, Deep Learning
architecures uses algorithms like MLP to obtain the classification
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Deep Learning for Time Series Classification
A Deep Learning algorithm is a composition of several layers that implement
non-linear functions
Every layer takes as input the output of the previous layer and applies its
non-linear transformation to compute its own output
The behavior of the non-linear transformations is controlled by trainable
parameters
Often, the last layer is a Multi Layer Perceptron or a Ridge regressor
We consider 3 different Deep Learning Architectures:
Convolutional Neural Network
Inception Time
Echo State Network
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Convolutional Neural Networks Architecture
A Convolutional Neural Network (CNN) is able to successfully capture the spatial
and temporal patterns through the application trainable filters
The pre-processing required in a Convolutional Neural Network is much lower as
compared to other classification algorithms
A Convolutional Neural Network is composed of three different layers:
1 Convolutional Layer
2 Pooling Layer
3 Fully-Connected Layer
Several Convolutional Layers and Pooling Layers are alternated before the
Fully-Connected Layer
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Convolutional Layer
The convolution performs a convolution of an input series of feature maps with a
filter matrix to obtain as output a different series of feature maps
The convolution is defined by a set of filters, that are fixed size matrices.
Single convolution step:
Convolution between one input feature map and a filter:
Convolutional Layer executes the convolution between every filter and every input
feature map
The values of the filters are considered as trainable weights and then are learned
during training.
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Stride
Stride controls how the filter convolves around one input feature map.
The value of stride indicates how many units must be shifted at a time.
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Padding
Padding indicates how many extra columns and rows to add outside an input
feature map, before applying a convolution filter
All the cells of the new columns and rows have a dummy value, usually 0.
Padding is used to preserve the original size of the input feature map after
Convolutional Layer, or make it drecresing slower
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Pooling Layer
The purpose of Pooling is to achieve a dimension reduction of feature maps
Pooling is applied to sliding windows of fixed size across the width and height of
every input feature map
There are two types of pooling: Max Pooling and Average Pooling.
For every sliding window the result of the pooling is the maximum or the average
value
Max Pooling works as a noise suppressant, discarding noisy activations.
Also for Pooling Layer stride and padding must be specified.
The advantage of pooling operation is down-sampling the convolutional output
bands, thus reducing variability in the hidden activations.
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Fully-Connected Layer
The goal of the Fully-Connected Layer is to learn non-linear combinations of the
high-level features
Usually the Fully Connected Layer is implemented with a Multi Layer Perceptron.
After several convolution and pooling operations, the output series of feature
maps are flattened into a vector
The flattened column is the input of the Multi-Layer Perceptron
The output has a number of neurons equal to the number of possible classes
Backpropagation is applied to every iteration of training and finally the model is
able to classify the time series
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Hyperparameters
Number of convolution filters
Few filters cannot extract enough features to achieve classification
Too many filters are helpless and computationally expensive
Convolution filter size and initial values
Smaller filters collect as much local information as possible
Bigger filters represent more global, high-level and representative information
The filters are usually initialized with random values.
Pooling method and size
Method: Max or Average
Size: when increases, the dimension reduction is greater, but more informations are lost
Weight initialization
The weights are usually initialized with small random numbers
Activation function
Rectifier, sigmoid or hyperbolic tangent are usually chosen
Number of epochs
Number of times the entire training set pass through the model
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Implementation
Building a Convolutional Neural Network is very easy using Python library Keras
To build a CNN in Keras, it is sufficient to:
declare a Sequential class
add the desired Convolutional, MaxPooling and Dense Keras Layers in the Sequential
class
specify number of filters and filter size for Convolutional Layer
specify pooling size for Pooling Layer
To compile the model, Keras requires:
the input shape
the optimizer
the loss function
a list of metrics
To train a model in Keras it’s sufficient to call the function fit() specifying the
needed parameters:
the training data (input data and targets),
the number of epochs
the validation data
To use the model, pass an array of input to the function predict() and it returns
the array of outputs
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Inception Time Architecture
Recently was introduced a deep Convolutional Neural Network called Inception
Time.
This kind of network shows high accuracy and very good scalability.
The Inception Network consists of a series of Inception Modules followed by a
Global Average Pooling Layer and a Fully Conencted Layer
A residual connections is added at every third inception module
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Inception Module
Inception Module consists of 4 Layers:
Bottleneck Layer
A set of parallel Convolutional Layers with different filter size
MaxPooling Layer
Depth Concatenation Layer
The network is able to extract relevant features of multiple resolutions thanks to
the use of filters with different sizes
Internal layers chooses which filter size is relevant to learn the relevant features
This is very helpful to identify a high-level feature that can have different sizes on
different input feature maps.
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Receptive Field and results
A neuron in an Inception Network depends only on a region of the input features
map, that is called Receptive Field of the neuron
For time series data, the total Receptive field of an Inception Network is given by
1 +
d
i=1
(ki − 1) (1)
It’s very interesting to investigate how the accuracy of an Inception Network
changes as the Receptive Field varies
The Figure shows Inception Network’s accuracy over a simulation dataset, with
respect to the filter length as well as the input time series length
It is evident that a longer filter is required to produce more accurate results
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Receptive Field and results
The Figure shows Inception Network’s accuracy over a simulation dataset, with
respect to the network’s depth as well as the length of the input time series.
It turns out that adding more layers doesn’t necessarily give an improvement of
the network’s performance, particularly for datasets with a small training set
Single Inception Network sometimes exhibits high variance in accuracy
For this reason Inception Time is implemented as an ensemble of many Inception
Networks
In this way the algorithm improves his stability, and shows high accuracy and very
good scalability
Different experiments have shown that its time complexity grows linearly with
both the training set size and the time series length
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Implementation
On github you can find an full implementation of Inception Time written with
Python using Keras library, at this link:
https://github.com/hfawaz/InceptionTime
This implementation is based on 3 main files:
File main.py contains the necessary code to run an experiement
File inception.py contains the Inception Network implementation
File nne.py contains the code that ensembles a set of Inception Networks
The implementation uses the Keras Module Class, since some layers of
InceptionTime work in parallel
The code that implements the Inception Module building block is very similar to
that described for CNNs, and can be easily included in codes based on Keras in
order to implement customized architectures
The structure of the code that implements compilation, training and use of the
model is very similar to that described for Convolutional Neural Networks
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Recurrent Neural Networks
Echo State Networks are a type of Recurrent Neural Networks
Recurrent Neural Networks are networks of neuron-like nodes organized into
successive layers
Like in standard Neural Networks, neurons are divided in Input Layer, Hidden
Layer and Output Layer
Each connection between neurons has a corresponding trainable weight
Every neurons is assigned to a fixed timestep
The neurons in the hidden layer are also forwarded in a time dependent direction
The input and output neurons are connected only to the hidden layers with the
same assigned timestep
The activation of the neurons is computed in time order
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Motivation of Echo State Networks
Recurrent Neural Networks (RNNs) are rarely applied for Time Series
Classification mainly due to three factors:
1 The type of this architecture is designed mainly to predict an output for each element in
the time series
2 Recurrent Neural Networks typically suffer from the vanishing gradient problem
3 The training of a RNN is hard to parallelize and computationally expensive
Echo State Networks were designed to mitigate the problems of Recurrent Neural
Networks by eliminating the need to compute the gradient for the hidden layers
This reduces the training time and avoid the vanishing gradient problem
Many results show that Echo State Networks are really helpful to handle chaotic
time series
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Echo State Networks Architecture
The Architecture of an Echo State Network consists of an Input Layer, a
Reservoir, a Dimension Reduction Layer, a Readout, and an Output Layer
The Reservoir is organized like a sparsely connected random RNN
The Dimension Reduction algorithm is usually implemented with the PCA
The Readout is usually implemented as MLP or a Ridge regressor
The weights between the Input layer and the Reservoir and those in the Reservoir
are randomly assigned and not trainable
The weights in the Readout are trainable
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Reservoir
The Reservoir is connected to the Input Layer, and consists in a set of internal
sparsely-connected neurons, and in its own output neurons.
In the Reservoir there are 4 types of weights:
the input weights
the internal weights
the output weights
the backpropagation weights
All these weights are randomly initialized, time independent and are not trainable
This output is added to the total Reservoir output, but acts also as input for the
next time step through backpropagation weights.
The output of the Reservoir is computed separately for every time step
At every time step, the activation of every internal and output neuron is computed
The Reservoir creates a recurrent non linear embedding of the input into a higher
dimension representation
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Dimension Reduction
Choosing the correct dimension reduction it’s possible to reduce the execution
time without lowering the accuracy
The Figure shows how training time and average classification accuracy vary with
respect to the subspace dimension D after dimension reduction, for a particular
experiment
Training time increases approximately linearly with D
Accuracy stops growing when D = 75
In this case the better value for the subspace dimension is 75
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Implementation and Hyperparameters
A full implementation in Python of Echo State Networks is available on Github at
this link:
https://github.com/FilippoMB/Reservoir-Computing-framework-for-multivariate-
time-series-classification/blob/master/README.md
The code uses the libraries Scikit-learn and SciPy.
The main class RC_classifier contained in the file modules.py permits to build,
train and test an Echo State Network classifier
The most important hyperparameters in the Reservoir are:
the number of neurons in the Reservoir
the percentage of nonzero connection weights
the largest eigenvalue of the reservoir matrix of connection weights
The most important hyperparameters in other layers are:
the algorithm for Dimensional Reduction Layer
the subspace dimension after the Dimension Reduction Layer
the type of Readout used for classification
the number of epochs
The structure of the code that implements training and use of the model is very
similar to that described for Convolutional Neural Networks
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Conclusions
Convolutional Neural Networks are the most popular Deep Learning technique for
Time Series Classifications
The main difficulties in using Convolutional Neural Networks:
The length of the time series can slow down training
Results can be not accurate as expected with chaotic input time series
Results can be not accurate as expected with input time series in which the same
relevant feature can have different sizes
To solve these problems, InceptionTime and Echo State Networks perform better
than the other purposed architectures
InceptionTime:
speeds up the training process using an efficient dimension reduction
performs really well in handling input time series in which the same relevant feature can
have different sizes
Echo State Networks:
Speed up the training process since they are very sparsely connected with most of their
weights fixed a priori
Really helpful to handle chaotic input time series
In conclusion, high accuracy and high scalability make these new architectures the
perfect candidate for product development
Time Series Classification with Deep Learning
Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography
Filippo Maria Bianchi, Simone Scardapane, Sigurd Løkse, Robert Jenssen
Reservoir computing approaches for representation and classification of
multivariate time series.
Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier,
Daniel F. Schmidt, Jonathan Weber, Geoffrey I. Webb, Lhassane Idoumghar,
Pierre-Alain Muller, François Petitjean InceptionTime: Finding AlexNet for Time
Series Classification.
Time Series Classification with Deep Learning

More Related Content

Time Series Classification with Deep Learning | Marco Del Pra

  • 1. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Time Series Classification with Deep Learning May 5, 2020 Time Series Classification with Deep Learning
  • 2. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Motivation During the last years, Time Series Classification (TSC) has become one of the most challenging problems in data mining Many classification problems can be treated as a Time Series Classification problems Time series are present in many real-world applications: health care, human activity recognition, cyber-security, finance. Many areas are strongly increasing their interest in applications based on time series Non Deep Learning algorithms require some kind of feature engineering before the classification Deep Learning algorithms already incorporate this kind of feature engineering internally Time Series Classification with Deep Learning
  • 3. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Examples of Time Series Classification Problems Electrocardiogram analysis Electrocardiogram records are saved in time series form Distinguishing a disease is a TSC problem Gesture recognition Many devices record series of images to interpret the user’s gestures Identifying the correct gesture is a TSC problem Anomaly detection Anomaly detection is the identification of unusual events Often the data in anomaly detection are time series Distinguishing and recognize an anomaly is a TSC problem Time Series Classification with Deep Learning
  • 4. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Problem definition Given a set of objects with the same structure and a fixed set of different classes, a dataset is a collection of pairs (object, class) Given a dataset, the goal of a Classification algorithm is to build a model that associates to an object the probability to belong to the possible classes, accordingly to the features of the objects associated to each class Univariate time series: ordered set of real values M-dimensional multivariate time series: M different univariate time series with the same length Time Series Classification problem: Classification problem where the objects of the dataset are univariate or multivariate time series Time Series Classification with Deep Learning
  • 5. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Perceptron (Neuron) The Perceptron (Neuron) is the basic element of many machine learning algorithms The goal of a Perceptron is to compute the wighted sum of the input values and then apply an activation function to the result Most common activation functions: The result of the activation function is referred as the activation of the Perceptron and represents its output value Time Series Classification with Deep Learning
  • 6. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Multi Layer Perceptron Architecture A Multi Layer Perceptron (MLP) is a class of feedforward neural networks, with one Input Layer, one or more Hidden Layers, and one Output Layer Multi Layer Perceptron is fully connected Each node of the hidden layers and of the output layer is a Perceptron The output of the Multi Layer Perceptron is obtained computing in sequence the activation of its Perceptrons The function that connect the input and the output depends on the values of the weights. Time Series Classification with Deep Learning
  • 7. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Classification with Multi Layer Perceptron Multi Layer Perceptron is commonly used for Classification problems It’s necessary to represent the pairs (object, class) in the dataset in a more suitable way: Every object must be represented with a vector, called input vector Every class must be represented with its one-hot label vector, called target For training, MLP uses Backpropagation technique that iterates on the input vectors Iteration steps: Computation of the output for the current input vector Computation of the prediction error with a cost function Upgrade of the weights with gradient descent Time Series Classification with Deep Learning
  • 8. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Classification with Multi Layer Perceptron The Backpropagation minimizes the loss on the training data After the training the model is able to predict the estimated probabilities of an object to belong to each class Why don’t use the MLP for TSC, taking the whole multivariate time series as input? MLP don’t work well for TSC problems because the length of the time series really hurts the computational speed It’s necessary to extract the relevant features of the input time series The big advantage of Deep Learning algorithms is that these relevant feature are learned during the training After many layers used for the extraction of the relevant features, Deep Learning architecures uses algorithms like MLP to obtain the classification Time Series Classification with Deep Learning
  • 9. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Deep Learning for Time Series Classification A Deep Learning algorithm is a composition of several layers that implement non-linear functions Every layer takes as input the output of the previous layer and applies its non-linear transformation to compute its own output The behavior of the non-linear transformations is controlled by trainable parameters Often, the last layer is a Multi Layer Perceptron or a Ridge regressor We consider 3 different Deep Learning Architectures: Convolutional Neural Network Inception Time Echo State Network Time Series Classification with Deep Learning
  • 10. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Convolutional Neural Networks Architecture A Convolutional Neural Network (CNN) is able to successfully capture the spatial and temporal patterns through the application trainable filters The pre-processing required in a Convolutional Neural Network is much lower as compared to other classification algorithms A Convolutional Neural Network is composed of three different layers: 1 Convolutional Layer 2 Pooling Layer 3 Fully-Connected Layer Several Convolutional Layers and Pooling Layers are alternated before the Fully-Connected Layer Time Series Classification with Deep Learning
  • 11. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Convolutional Layer The convolution performs a convolution of an input series of feature maps with a filter matrix to obtain as output a different series of feature maps The convolution is defined by a set of filters, that are fixed size matrices. Single convolution step: Convolution between one input feature map and a filter: Convolutional Layer executes the convolution between every filter and every input feature map The values of the filters are considered as trainable weights and then are learned during training. Time Series Classification with Deep Learning
  • 12. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Stride Stride controls how the filter convolves around one input feature map. The value of stride indicates how many units must be shifted at a time. Time Series Classification with Deep Learning
  • 13. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Padding Padding indicates how many extra columns and rows to add outside an input feature map, before applying a convolution filter All the cells of the new columns and rows have a dummy value, usually 0. Padding is used to preserve the original size of the input feature map after Convolutional Layer, or make it drecresing slower Time Series Classification with Deep Learning
  • 14. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Pooling Layer The purpose of Pooling is to achieve a dimension reduction of feature maps Pooling is applied to sliding windows of fixed size across the width and height of every input feature map There are two types of pooling: Max Pooling and Average Pooling. For every sliding window the result of the pooling is the maximum or the average value Max Pooling works as a noise suppressant, discarding noisy activations. Also for Pooling Layer stride and padding must be specified. The advantage of pooling operation is down-sampling the convolutional output bands, thus reducing variability in the hidden activations. Time Series Classification with Deep Learning
  • 15. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Fully-Connected Layer The goal of the Fully-Connected Layer is to learn non-linear combinations of the high-level features Usually the Fully Connected Layer is implemented with a Multi Layer Perceptron. After several convolution and pooling operations, the output series of feature maps are flattened into a vector The flattened column is the input of the Multi-Layer Perceptron The output has a number of neurons equal to the number of possible classes Backpropagation is applied to every iteration of training and finally the model is able to classify the time series Time Series Classification with Deep Learning
  • 16. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Hyperparameters Number of convolution filters Few filters cannot extract enough features to achieve classification Too many filters are helpless and computationally expensive Convolution filter size and initial values Smaller filters collect as much local information as possible Bigger filters represent more global, high-level and representative information The filters are usually initialized with random values. Pooling method and size Method: Max or Average Size: when increases, the dimension reduction is greater, but more informations are lost Weight initialization The weights are usually initialized with small random numbers Activation function Rectifier, sigmoid or hyperbolic tangent are usually chosen Number of epochs Number of times the entire training set pass through the model Time Series Classification with Deep Learning
  • 17. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Implementation Building a Convolutional Neural Network is very easy using Python library Keras To build a CNN in Keras, it is sufficient to: declare a Sequential class add the desired Convolutional, MaxPooling and Dense Keras Layers in the Sequential class specify number of filters and filter size for Convolutional Layer specify pooling size for Pooling Layer To compile the model, Keras requires: the input shape the optimizer the loss function a list of metrics To train a model in Keras it’s sufficient to call the function fit() specifying the needed parameters: the training data (input data and targets), the number of epochs the validation data To use the model, pass an array of input to the function predict() and it returns the array of outputs Time Series Classification with Deep Learning
  • 18. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Inception Time Architecture Recently was introduced a deep Convolutional Neural Network called Inception Time. This kind of network shows high accuracy and very good scalability. The Inception Network consists of a series of Inception Modules followed by a Global Average Pooling Layer and a Fully Conencted Layer A residual connections is added at every third inception module Time Series Classification with Deep Learning
  • 19. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Inception Module Inception Module consists of 4 Layers: Bottleneck Layer A set of parallel Convolutional Layers with different filter size MaxPooling Layer Depth Concatenation Layer The network is able to extract relevant features of multiple resolutions thanks to the use of filters with different sizes Internal layers chooses which filter size is relevant to learn the relevant features This is very helpful to identify a high-level feature that can have different sizes on different input feature maps. Time Series Classification with Deep Learning
  • 20. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Receptive Field and results A neuron in an Inception Network depends only on a region of the input features map, that is called Receptive Field of the neuron For time series data, the total Receptive field of an Inception Network is given by 1 + d i=1 (ki − 1) (1) It’s very interesting to investigate how the accuracy of an Inception Network changes as the Receptive Field varies The Figure shows Inception Network’s accuracy over a simulation dataset, with respect to the filter length as well as the input time series length It is evident that a longer filter is required to produce more accurate results Time Series Classification with Deep Learning
  • 21. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Receptive Field and results The Figure shows Inception Network’s accuracy over a simulation dataset, with respect to the network’s depth as well as the length of the input time series. It turns out that adding more layers doesn’t necessarily give an improvement of the network’s performance, particularly for datasets with a small training set Single Inception Network sometimes exhibits high variance in accuracy For this reason Inception Time is implemented as an ensemble of many Inception Networks In this way the algorithm improves his stability, and shows high accuracy and very good scalability Different experiments have shown that its time complexity grows linearly with both the training set size and the time series length Time Series Classification with Deep Learning
  • 22. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Implementation On github you can find an full implementation of Inception Time written with Python using Keras library, at this link: https://github.com/hfawaz/InceptionTime This implementation is based on 3 main files: File main.py contains the necessary code to run an experiement File inception.py contains the Inception Network implementation File nne.py contains the code that ensembles a set of Inception Networks The implementation uses the Keras Module Class, since some layers of InceptionTime work in parallel The code that implements the Inception Module building block is very similar to that described for CNNs, and can be easily included in codes based on Keras in order to implement customized architectures The structure of the code that implements compilation, training and use of the model is very similar to that described for Convolutional Neural Networks Time Series Classification with Deep Learning
  • 23. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Recurrent Neural Networks Echo State Networks are a type of Recurrent Neural Networks Recurrent Neural Networks are networks of neuron-like nodes organized into successive layers Like in standard Neural Networks, neurons are divided in Input Layer, Hidden Layer and Output Layer Each connection between neurons has a corresponding trainable weight Every neurons is assigned to a fixed timestep The neurons in the hidden layer are also forwarded in a time dependent direction The input and output neurons are connected only to the hidden layers with the same assigned timestep The activation of the neurons is computed in time order Time Series Classification with Deep Learning
  • 24. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Motivation of Echo State Networks Recurrent Neural Networks (RNNs) are rarely applied for Time Series Classification mainly due to three factors: 1 The type of this architecture is designed mainly to predict an output for each element in the time series 2 Recurrent Neural Networks typically suffer from the vanishing gradient problem 3 The training of a RNN is hard to parallelize and computationally expensive Echo State Networks were designed to mitigate the problems of Recurrent Neural Networks by eliminating the need to compute the gradient for the hidden layers This reduces the training time and avoid the vanishing gradient problem Many results show that Echo State Networks are really helpful to handle chaotic time series Time Series Classification with Deep Learning
  • 25. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Echo State Networks Architecture The Architecture of an Echo State Network consists of an Input Layer, a Reservoir, a Dimension Reduction Layer, a Readout, and an Output Layer The Reservoir is organized like a sparsely connected random RNN The Dimension Reduction algorithm is usually implemented with the PCA The Readout is usually implemented as MLP or a Ridge regressor The weights between the Input layer and the Reservoir and those in the Reservoir are randomly assigned and not trainable The weights in the Readout are trainable Time Series Classification with Deep Learning
  • 26. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Reservoir The Reservoir is connected to the Input Layer, and consists in a set of internal sparsely-connected neurons, and in its own output neurons. In the Reservoir there are 4 types of weights: the input weights the internal weights the output weights the backpropagation weights All these weights are randomly initialized, time independent and are not trainable This output is added to the total Reservoir output, but acts also as input for the next time step through backpropagation weights. The output of the Reservoir is computed separately for every time step At every time step, the activation of every internal and output neuron is computed The Reservoir creates a recurrent non linear embedding of the input into a higher dimension representation Time Series Classification with Deep Learning
  • 27. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Dimension Reduction Choosing the correct dimension reduction it’s possible to reduce the execution time without lowering the accuracy The Figure shows how training time and average classification accuracy vary with respect to the subspace dimension D after dimension reduction, for a particular experiment Training time increases approximately linearly with D Accuracy stops growing when D = 75 In this case the better value for the subspace dimension is 75 Time Series Classification with Deep Learning
  • 28. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Implementation and Hyperparameters A full implementation in Python of Echo State Networks is available on Github at this link: https://github.com/FilippoMB/Reservoir-Computing-framework-for-multivariate- time-series-classification/blob/master/README.md The code uses the libraries Scikit-learn and SciPy. The main class RC_classifier contained in the file modules.py permits to build, train and test an Echo State Network classifier The most important hyperparameters in the Reservoir are: the number of neurons in the Reservoir the percentage of nonzero connection weights the largest eigenvalue of the reservoir matrix of connection weights The most important hyperparameters in other layers are: the algorithm for Dimensional Reduction Layer the subspace dimension after the Dimension Reduction Layer the type of Readout used for classification the number of epochs The structure of the code that implements training and use of the model is very similar to that described for Convolutional Neural Networks Time Series Classification with Deep Learning
  • 29. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Conclusions Convolutional Neural Networks are the most popular Deep Learning technique for Time Series Classifications The main difficulties in using Convolutional Neural Networks: The length of the time series can slow down training Results can be not accurate as expected with chaotic input time series Results can be not accurate as expected with input time series in which the same relevant feature can have different sizes To solve these problems, InceptionTime and Echo State Networks perform better than the other purposed architectures InceptionTime: speeds up the training process using an efficient dimension reduction performs really well in handling input time series in which the same relevant feature can have different sizes Echo State Networks: Speed up the training process since they are very sparsely connected with most of their weights fixed a priori Really helpful to handle chaotic input time series In conclusion, high accuracy and high scalability make these new architectures the perfect candidate for product development Time Series Classification with Deep Learning
  • 30. Introduction Time Series Classification Convolutional Neural Networks Inception Time Echo State Networks Conclusions Bibliography Filippo Maria Bianchi, Simone Scardapane, Sigurd Løkse, Robert Jenssen Reservoir computing approaches for representation and classification of multivariate time series. Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier, Daniel F. Schmidt, Jonathan Weber, Geoffrey I. Webb, Lhassane Idoumghar, Pierre-Alain Muller, François Petitjean InceptionTime: Finding AlexNet for Time Series Classification. Time Series Classification with Deep Learning