Hand-Gesture Recognition Based On EMG and Event-Based Camera Sensor Fusion: A Benchmark in Neuromorphic Computing
Hand-Gesture Recognition Based On EMG and Event-Based Camera Sensor Fusion: A Benchmark in Neuromorphic Computing
Hand-Gesture Recognition Based On EMG and Event-Based Camera Sensor Fusion: A Benchmark in Neuromorphic Computing
with respect to the GPU system but have a significantly smaller energy-delay product
(EDP) which makes them between 30× and 600× more efficient. The proposed
work represents a new benchmark that moves neuromorphic computing toward a
real-world scenario.
Keywords: hand-gesture classification, spiking neural networks (SNNs), electromyography (EMG) signal
processing, event-based camera, sensor fusion, neuromorphic engineering
of the ODIN and MorphIC Spiking Neural Network (SNN) the scene (represented in Figure 1). The static information is
processors (Frenkel et al., 2019a,b). The two neuromorphic directly removed on the hardware side and only the dynamic
systems present different features, in particular, depending one, corresponding to the movements in the scene, is actually
on the number of neurons available and on the input data, we transmitted. In this way the DVS can reach low latency, down to
implemented different SNN architectures. For example, for visual 10 µs, reducing the power consumption needed for computation
data processing, a spiking CNN is implemented in Loihi while and the amount of transmitted data. Each spike is encoded
a spiking Multi-Layer Perceptron (MLP) is chosen for ODIN + using the Address Event Representation (AER) communication
MorphIC (see section 2.3). For the case of EMG, the data was protocol (Deiss et al., 1999) and is represented by the address of
collected using the Myo armband that senses electrical activity the pixel (in x-y coordinates), the polarity (1 bit for the sign), and
in the forearm muscles. The data was later converted into spikes the timestamp (in microsecond resolution).
to be fed into the neuromorphic systems. Here, we propose a
feasible application to show the neuromorphic performance in 2.1.2. EMG Sensor
terms of accuracy, energy consumption, and latency (stimulus In the proposed work, we collected the EMG corresponding to
duration + inference time). The performance metric for the hand gestures using the Myo armband by Thalmic Labs Inc.
energy consumption is the Energy-Delay Product (EDP), a The Myo armband is a wearable device provided with eight
metric suitable for most modern processor platforms defined equally spaced non-invasive EMG electrodes and a Bluetooth
as the average energy consumption multiplied by the average transmission module. The EMG electrodes detect signals from
inference time. The inference time is defined as the time elapsed the forearm muscles activity and afterwards the acquired data is
between the end of the stimulus and the classification. To sent to an external electronic device. The sampling rates for Myo
validate the neuromorphic results, we are comparing it to data are fixed at 200Hz and the data is returned as a unitless 8-bit
a baseline consisting of the network implemented, using a unsigned integer for each sensor representing “activation” and
standard machine learning approach, where the inputs are fed does not translate to millivolts (mV).
as continuous EMG signals and video frames. We propose this
comparison for a real case scenario as a benchmark, in order 2.2. DVS-EMG Dataset
for the neuromorphic research field to advance into mainstream The dataset is a collection of five hand gestures recorded with
computing (Davies, 2019). the two sensor modalities: muscle activity from the Myo and
visual input, in the form of DVS events. Moreover, the dataset
also provides the video recording using a traditional frame-based
2. MATERIALS AND METHODS camera, referred to as Active Pixel Sensor (APS) in this paper.
The frames from the APS are used as ground truth and as input in
In the following section, we describe the overall system
the baseline models. The APS-frames provided in the dataset are
components. We start from the description of the sensors
gray-scale, 240 × 180 resolution. The dataset contains recordings
used to collect the hand-gesture data, namely the event-based
from 21 subjects: 12 males and nine females aged from 25 to 35
camera, Dynamic Vision Sensor (DVS), and the EMG armband
(see Data Availability Statement for the full access to the dataset).
sensor, Myo. We then describe the procedure with which
The structure is the following: each subject repeats three sessions,
we collected the dataset used for the validation experiments
in each session the subject performs five hand gestures: pinky,
presented here and which is publicly available. Afterwards, the
elle, yo, index, and thumb (see Figure 2), repeated 5 times. Each
two neuromorphic systems under consideration, namely Loihi
single gesture recording lasts 2s. The gestures are separated by
and ODIN + MorphIC, will be described, focusing on their
a relaxing time of 1s, to remove any residual activity from the
system specifics, characteristics, and the model architectures
previous gesture. Every recording is cut in 10 chunks of 200ms
that will be implemented on them. Finally, we describe the
each, this duration was selected to match the requirements of a
system that we call baseline and which represents the point of
real-case scenario of low latency prosthesis control where there is
comparison between a traditional von-Neumann approach and
a need for the classification and creation of the motor command
the two neuromorphic systems.
within 250 ms (Smith et al., 2011). Therefore, the final number of
samples results in 21 (subjects) × 3 (trials) × 5 (repetitions) × 5
2.1. DVS and EMG Sensors (gestures) × 10 (chunks) for a total of 15,750. The Myo records
2.1.1. DVS Sensor the superficial muscle activity at the middle forearm from eight
The DVS (Lichtsteiner et al., 2006) is a neuromorphic camera electrodes with a sampling rate of 200Hz. During the recordings,
inspired by the visual processing in the biological retina. Each the DVS was mounted on a random moving system to generate
pixel in the sensor array responds asynchronously to logarithmic relative movement between the sensor and the subject’s hand. The
changes in light. Whenever the incoming illumination increases hand remains static during the recording to avoid noise in the
or decreases above a certain threshold, it generates a polarity Myo sensor and the gestures are performed in front of a static
spike event. The polarity corresponds to the sign of the change; white background, see Figure 2 for the full setup.
ON polarity for an increase in light, and OFF polarity for
a decrease in light. The output is a continuous and sparse 2.2.1. Implementation on Neuromorphic Devices
train of events, interchangeably called spikes throughout this SNNs, in general, and their implementation on neuromorphic
paper, that carries the information of the active pixels in devices require inputs as spike trains. In the case of the DVS, the
ON events
Even
ts de ON events
nsity OFF events
OFF events
Raw EMG
35 0.8
30 0.6
Amplitude [a.u.]
Address y
25 0.4
20 0.2
15 0.0
10 −0.2 0.10
5 −0.4 0.08
0.08
0 −0.6 0.06
0.06
s]
40 35 0.04
e[
]
30 25 0.04 7
[s
Tim
5 0.02
e
20 15 4 3
m
Add 0.02
Ti
FIGURE 1 | Example, for a gesture “elle,” of spike streams for DVS (left) and EMG (right). In the EMG figure the spikes are represented by dots while the continuous
line is the raw EMG. Different channels have different colors.
CH2
CH3
CH4
CH1598
CH1599
CH1600
CH2
CH3
CH4
CH5
CH6
CH7
CH8
FIGURE 2 | System overview. From left to right: (A) data collection setup featuring the DVS, the traditional camera and the subject wearing the EMG armband sensor,
(B) data streams of (b1) DVS and (b2) EMG transformed into spikes via the Delta modulation approach, (C) the two neuromorphic systems namely (c1) Loihi and (c2)
ODIN + MorphIC, (D) the hand gestures that the system is able to recognize in real time.
sensor output is already in the form of spikes and polarity. The For this reason, we decided to crop the 128 × 128 input of the
only requirement that we need to take into account is the limited DVS to 40 × 40 centered on the hand-gesture. On the contrary,
number of neurons in the available neuromorphic processors. for the EMG, a conversion in the event-based domain is required.
The solution used here is the delta-modulator ADC algorithm, topologies. Training is carried out in Keras with quantization-
based on a sigma-delta modulator circuit (Corradi and Indiveri, aware stochastic gradient descent following a standard ANN-to-
2015). This mechanism is particularly used in low frequency, SNN mapping approach (Hubara et al., 2017; Moons et al., 2017;
high performance and low power applications (Lee et al., 2005), Rueckauer et al., 2017), the resulting SNNs process the EMG and
such as biomedical circuits. Moreover, this modulator represents DVS spikes without further preprocessing.
a good interface for neuromorphic devices because it has much In order to process the spike-based EMG gesture data, we
less circuit complexity and lower power consumption than selected ODIN so as to benefit from 3-bit weights. Indeed,
multi-bit ADCs. due to the low input dimensionality of EMG data, satisfactory
The delta-modulator algorithm transforms a continuous performance could not be reached with the binary weight
signal into two digital pulse outputs, UP or DOWN, according to resolution of MorphIC. A 3-bit-weight 16-230-5 SNN is thus
the signal derivative. The UP (DOWN) spikes are generated every implemented in ODIN, this setup will be referred to as the
time the signal exceeds a positive (negative) threshold, like the EMG-ODIN network.
ON (OFF) events from the DVS. As described before, the signal For the DVS gesture data classification, we selected MorphIC,
is sampled at 200Hz, this means that a new sample is acquired to benefit from its higher neuron and synapse resources. ON/OFF
every 5 ms. To increase the time resolution of the generated spike DVS events are treated equally and their connections to the
train, which otherwise would contain too few spikes, the EMG network are learned, so that any of them can be either excitatory
signals are over-sampled to a higher frequency before undergoing or inhibitory. Similarly to a setup previously proposed for MNIST
the transformation into spikes (Donati et al., 2019). benchmarking (Frenkel et al., 2019b), the input 40 × 40-pixel
For our specific EMG acquisition features, we set the threshold DVS event streams can be subsampled into four 20 × 20-
at 0.05 and an interpolation factor of 3500; these values have been pixel event streams and processed independently in the four
selected from previous studies which looked at quality of signal cores of MorphIC, thus leading to an accuracy boost when
reconstruction (Donati et al., 2018, 2019). combining the outputs of all subnetworks, subsequently denoted
as subMLPs. The four subMLPs have a 400-210-5 topology
with binary weights, this setup will thus be referred to as the
2.3. Neuromorphic Processors DVS-MorphIC network.
2.3.1. ODIN + MorphIC To ease sensor fusion, the hidden layer sizes of the EMG-
The ODIN (Online-learning DIgital spiking Neuromorphic) ODIN and DVS-MorphIC networks and the associated firing
processor occupies an area of only 0.086 mm2 in 28 nm thresholds were optimized by parameter search so as to balance
FDSOI CMOS (Frenkel et al., 2019a)1 . It consists of a single their activities. These hidden layers were first flattened into a
neurosynaptic core with 256 neurons and 2562 synapses. Each 1,070-neuron layer, then a 5-neuron output layer was retrained
neuron can be configured to phenomenologically reproduce the with 3-bit weights and implemented in ODIN. This setup
20 Izhikevich behaviors of spiking neurons (Izhikevich, 2004). will be referred to as the Fusion-ODIN network, which thus
The synapses embed a 3-bit weight and a mapping table bit encapsulates EMG processing in ODIN, DVS processing in
that allows enabling or disabling Spike-Dependent Synaptic MorphIC, and sensor fusion in ODIN. From an implementation
Plasticity (SDSP) locally (Brader et al., 2007), thus allowing for point of view, mapping the MorphIC hidden layer output
the exploration of both off-chip training and on-chip online spikes back to ODIN as sensor fusion requires an external
learning setups. mapping table. Its overhead is excluded from the results provided
MorphIC is a quad-core digital neuromorphic processor in section 3.
with 2k LIF neurons and more than 2M synapses in 65nm
CMOS (Frenkel et al., 2019b). MorphIC was designed for high- 2.3.2. Loihi and Its Training Framework SLAYER
density large-scale integration of multi-chip setups. The four 512- Intel’s Loihi (Davies et al., 2018) is an asynchronous
neuron crossbar cores are connected with a hierarchical routing neuromorphic research processor. Each Loihi chip consists
infrastructure that enables neuron fan-in and fan-out values of of 128 neurocores, with each neurocore capable of implementing
1k and 2k, respectively. The synapses are binary and can be either up to 1,024 current based (CUBA) Leaky Integrate and Fire (LIF)
programmed with offline-trained weights or trained online with neurons. The network state and configuration is stored entirely
a stochastic version of SDSP. in on-chip SRAMs local to each core, this allows each core to
Both ODIN and MorphIC follow a standard synchronous access its local memories independently of other cores without
digital implementation, which allows their operation to be needing to share a global memory bus (and in fact removing
predicted with one-to-one accuracy by custom Python-based the need for off-chip memory). Loihi supports a number of
chip simulators. As both chips rely on crossbar connectivity, different encodings for representing network connectivity, thus
CNN topologies can be explored but are limited to small allowing the user to choose the most efficient encoding for their
networks due to an inefficient resource usage in the absence of task. Each Loihi chip also contains three small synchronous ×86
a weight reuse mechanism (Frenkel et al., 2019b). The selected processors which help monitor and configure the network, as
SNN architectures are thus based on fully-connected MLP well as assisting with the injection of spikes and recording of
output spikes.
1 The HDL source code and documentation of ODIN are publicly available at SLAYER (Shrestha and Orchard, 2018) is a backpropagation
https://github.com/ChFrenkel/ODIN. framework for evaluating the gradient of any kind of SNN
[i.e., spiking MLP and spiking CNN] directly in the spiking Here, XcY denotes a convolution layer with X kernels of
domain. It is a dt-based SNN backpropagation algorithm shape Y-by-Y, while 2p denotes a 2-by-2 max pooling
that keeps track of the internal membrane potential of the layer. Zero padding was applied for all convolution layers. No
spiking neuron and uses it during gradient propagation. There preprocessing on the spike events was performed, the ON/OFF
are two main guiding principles of SLAYER: temporal credit events are treated as different input channels, hence the input
assignment policy and probabilistic spiking neuron behavior shape 40x40x2. For this network, current and voltage decay
during error backpropagation. Temporal credit assignment constants for the Loihi neurons were set to 1,024 (32 ms) and
policy acknowledges the temporal nature of a spiking neuron 128 (4 ms).
where a spike event at a particular time has its effect on future Finally, a third network where the penultimate layer neurons
events. Therefore, the error credit of an error at a particular time of DVS and EMG networks were fused together was trained.
needs to be distributed back in time. SLAYER is one of the few Only the last fully connected weights (640-5) were trained. The
methods that consider temporal effects during backpropagation. parameters of the network before fusion were preserved. The
The use of probabilistic neurons during backpropagation helps current and voltage decay constants of 1,024 (32 ms) and 128
estimate the spike function derivative, which is a major challenge (4 ms), respectively, were used for the final fusion layer neurons.
for SNN backpropagation, with the spike escape rate function From now on, we will refer to these three networks as EMG-
of a probabilistic neuron. The end effect is that the spike Loihi, DVS-Loihi, and Fusion-Loihi whenever there is ambiguity.
escape rate function is used to estimate the spike function
derivative, similar to the surrogate gradient concept (Zenke 2.4. Traditional Machine Learning
and Ganguli, 2018; Neftci et al., 2019). With SLAYER, we can Baselines
train synaptic weights as well as axonal delays and achieve Machine Learning (ML) methods, and in general data-driven
state of the art performances (Shrestha and Orchard, 2018) on approaches, are currently the dominant tools used to solve
neuromorphic datasets. complex classification tasks since they give the best performance
SLAYER uses the versatile Spike Response compared to other approaches. We compare the performance
Model (SRM) (Gerstner, 1995) which can be customized to of the two fully neuromorphic systems described in the above
represent a wide variety of spiking neurons with a simple change sections, against a traditional machine learning pipeline that uses
of spike response kernels. It is implemented2 atop the PyTorch frame-based inputs, i.e., traditionally sampled EMG signals and
framework with automatic differentiation support (Paszke et al., traditionally sampled video frames. For the comparisons to be
2017) with the flexibility of feedforward dense, convolutional, fair, in the traditional approach we maintain the same constraints
pooling, and skip connections in the network. imposed by the neuromorphic hardware. In particular, we used
SLAYER-PyTorch also supports training with the exact CUBA the same neural network architectures as those used in the
Leaky Integrate and Fire neuron model in Loihi (Davies et al., neuromorphic systems. Note that two different networks were
2018). To train for the fixed precision constraints on weights implemented, spiking MLP and spiking CNN (see Figure 3 for
and delays of Loihi hardware, it trains the network with the more details on the architectures). For this reason, we have two
quantization constraints and then trains using the strategy different baseline models that are paired to the two considered
of shadow variables (Courbariaux et al., 2015; Hubara et al., neuromorphic systems.
2016) where the constrained network is used in the forward
propagation phase and the full precision shadow variables are 2.4.1. EMG Feature Extraction
used during backpropagation. Traditional EMG signal processing consists of various steps. First,
We used SLAYER-PyTorch to train a Loihi compatible signal pre-processing is used to extract useful information by
network for the hand-gesture recognition task. The networks applying filters and transformations. Then, feature extraction is
were trained offline using GPU and trained weights and delays used to highlight meaningful structures and patterns. Finally,
were used to configure the network on Loihi hardware for a classifier maps the selected features to output classes.
inference purposes. All the figures reported here are for inference In this section we describe the EMG feature extraction
using Loihi, with one algorithmic time tick in Loihi of 1 ms. phase, in particular we consider time domain features used
A spiking MLP of architecture 16-128d-128d-5 was for the classification of gestures with the baseline models.
trained for EMG gestures converted into spikes (section 2.2.1). We extracted two time domain features generally used in
Here, 128d means the fully connected layer has 128 neurons literature (Phinyomark et al., 2018), namely Mean Absolute Value
with trained axonal delays. The Loihi neuron with current (MAV) and Root Mean Square (RMS) shown in Equation (1).
and voltage decay constants of 1,024 (32 ms) was used for The MAV is the average of the muscles activation value and it is
this network. calculated by a stride-moving window. The RMS is represented as
For the gesture classification using DVS data we used both a amplitude relating to a gestural force and muscular contraction.
spiking MLP, with the same architecture as the one deployed on The two features are calculated across a window of 40 samples,
MorphIC and described in section 2.3.1, and a spiking CNN with corresponding to 200 ms:
architecture 40x40x2-8c3-2p-16c3-2p-32c3-512-5.
v
T u T
1X u1 X
2 SLAYER-PyTorch is publicly available at https://github.com/bamsumit/ MAV(xc ) = |xc (t)| RMS(xc ) = t xc2 (t) (1)
slayerPytorch. T T
t=0 t=0
ON events
Even
ts de
nsity OFF events
35
30
Address y
25
20
15
10
5
0.08
0 0.06
40 35 ]
30 25 0.04
[s
e
20 15
m
Add 0.02
Ti
ress 10
x 5
0 0.00
35 0.8
30 0.6
Amplitude [a.u.]
0.4
Address y
25
0.2
20
0.0
15
−0.2 0.10
10
−0.4 0.08
5 0.06
0.08 −0.6
]
e [s
0 0.06 0.04
40 35 7 6
Tim
5 0.02
]
30 25 0.04
[s
4 3
2
e
20 15 Channe 1 0 0.00
m
Add 0.02
Ti
ress 10 l
x 5
0 0.00
FIGURE 3 | Architectures of the neural networks implemented on the neuromorphic systems and used in the baselines. (A) CNN architecture implemented on Loihi;
the corresponding baseline CNN receives APS frames instead of DVS events. (B) subMLP architectures implemented on MorphIC, the corresponding baseline
subMLPs receive APS frames instead of DVS events. (C) MLP architecture for the EMG data implemented on Loihi (c1) and on ODIN (c2), the corresponding baseline
MLPs receive EMG features instead of EMG events. The shading indicates those layers that are concatenated during the fusion of the networks.
where xc (t) is the signal in the time domain for the EMG channel processing. For the visual input, we used the same subMLP-based
with index c and T is the number of samples in the considered network structure as the one described in section 2.3.1, but with
window, which was set to T = 40 (N = 200 ms) across this work. gray-scale APS frames. The 40 × 40 cropped APS frames are
The features were calculated for each channel separately and the sub-sampled and fed into four 2-layer subMLPs of architecture
resulting values were concatenated in a vector F(n) described 400-210-5, as shown in Figure 3B. The outputs of the four
in Equation (2): subMLPs are then summed when classifying with a single sensor
T and are concatenated for the fusion network. The EMG neural
F(n) = F(x1 ), . . . , F(xC ) (2) network is a 2-layer MLP of architecture 16-230-5. The fusion
network is obtained as described above for the Loihi baseline.
where F is MAV or RMS, n is the index of the window and C
is the number of EMG channels. The final feature vector E(n) for 2.4.3. Baseline Loihi
window n is shown in Equation (3), it is used for the classification As described in section 2.3.2, we used a spiking MLP and
and is obtained by concatenating the two single feature vectors. a spiking CNN to process and classify DVS events. For the
h iT Loihi baseline, we kept the exact same architectures, except for
E(n) = MAV(n)T , RMS(n)T (3) the axonal delays. Moreover, both architectures of the baseline
receive the corresponding gray-scale APS frames instead of
2.4.2. Baseline ODIN + MorphIC the DVS events. The baseline MLP architecture and the CNN
As described in section 2.3.1, a CNN cannot be efficiently architectures are shown in Figures 3A,B, respectively. Note that
implemented on crossbar cores, which is the architecture ODIN the number of parameters between the baseline networks and the
and MorphIC rely on. We will therefore rely solely on fully- spiking networks implemented on Loihi is slightly different since
connected MLPs networks for both visual and EMG data the input has one channel (gray-scale) in the case of the baseline
TABLE 1 | Comparison of traditional and neuromorphic systems on the task of gesture recognition for both single sensor and sensor fusion.
System Modality Accuracy (%) Energy (uJ) Inference time (ms) EDP (uJ * s)
EMG 68.1 ± 2.8 (25.5 ± 8.4) · 103 3.8 ± 0.1 97.3 ± 4.4
CNN (GPU) APS 92.4 ± 1.6 (31.7 ± 7.4) · 103 5.9 ± 0.1 186.9 ± 3.9
EMG+APS 95.4 ± 1.7 (32.1 ± 7.9) · 103 6.9 ± 0.05 221.1 ± 4.1
EMG 67.2 ± 3.6 (23.9 ± 5.6) · 103 2.8 ± 0.08 67.2 ± 2.9
MLP (GPU) APS 84.2 ± 4.3 (30.2 ± 7.5) · 103 6.9 ± 0.1 211.3 ± 6.1
EMG+APS 88.1 ± 4.1 (32.0 ± 8.9) · 103 7.9 ± 0.05 253.0 ± 3.9
The results of the accuracy are reported with mean and standard deviation obtained over a 3-fold cross validation.
90
80
Accuracy [%]
70
60
50
40
30
FIGURE 4 | Accuracy vs. stimulus duration for the Loihi system and its software baseline counterpart. In green the results for the CNN (GPU), in purple the results for
the spiking CNN (Loihi). No classification is present for APS frames before 25 ms since the frame rate is 20 fps.
Loihi networks are restricted to 8-bit fixed precision weights and CNN, the Loihi CNN can respond to the input within a few ms of
6-bit fixed precision delays. inputs. However, for the vision modality, notice that, because the
To evaluate the performance over time of the Loihi networks, frame rate of the camera is 20 fps, there is no classification before
stimulus duration vs. testing accuracy is plotted in Figure 4. 25ms. Therefore, for short stimulus duration, the neuromorphic
We can see that the EMG-Loihi network continues to improve system has higher accuracy than the traditional system.
with longer stimulus duration. Table 1 and Figure 4 show the
results of the Loihi baseline. From an accuracy point of view the 3.2. ODIN + MorphIC Results
baseline reaches a higher classification accuracy only in the EMG Inference statistics for a 200 ms sample duration are reported in
classification, while both the visual classification and fusion are Table 3 for the EMG-ODIN, DVS-MorphIC, and Fusion-ODIN
on par with the Loihi networks and show only a non-significant networks. Chip utilization is computed as the percentage of
difference. In terms of inference time, the baseline running on the neuron resources taken by the hidden and output layers in ODIN
GPU system is systematically faster than Loihi, but never more and MorphIC, while the power consumption P of the crossbar
than 40% faster. As expected, the energy consumption of the cores of both chips can be decomposed as
GPU system is significantly higher than the Loihi system. Loihi is
P = Pleak + Pidle fclk + ESOP rSOP , (4)
around 30× more efficient than the baseline with concern to the
fusion network and more than 150× and 40× more efficient with where Pleak is the chip leakage power and Pleak + Pidle fclk
concern to the EMG and DVS processing, respectively. Figure 4 represents the static power consumption when a clock of
shows in more details the effect of stimulus duration on the frequency fclk is connected, without network activity. The term
classification accuracy. As expected, EMG is the modality that ESOP rSOP thus represents the dynamic power consumption,
suffers more from classification based on short segments (Smith where ESOP is the energy per synaptic operation (SOP) and rSOP
et al., 2011), reaching the best accuracy only after 200 ms for both is the SOP processing rate, each SOP taking two clock cycles.
the neuromorphic system and the baseline, while the accuracy Detailed power models extracted from chip measurements of
for vision and fusion modalities saturate much more quickly, ODIN and MorphIC are provided in Frenkel et al. (2019a,b),
in around 100 ms for the neuromorphic system and 50 ms for respectively. The results reported in Tables 1, 3 are obtained with
the baseline. The traditional system reaches its best performance ODIN and MorphIC optimizing for power, under the conditions
after 50 ms while the neuromorphic system reaches its best summarized in Table 4. The dynamic power consumption
performance after 200ms. One should, however, also note that reported in Table 4 reflects the regime in which ODIN and the
the DVS sensor contains only the edge information of the scene four cores of MorphIC run at the maximum SOP processing
whereas the baseline network uses the image frame. Therefore, rate rSOP = fclk /2.
the spiking CNN requires some time to integrate the input A limitation of the crossbar-based architecture of ODIN
information from DVS. Despite the inherent delays in a spiking and MorphIC is that each neuron spike leads to a systematic
TABLE 3 | Inference statistics of ODIN and MorphIC models on 200 ms-long samples.
TABLE 4 | Low-power operating conditions of ODIN and MorphIC at minimum alleviated by reducing the neuron firing thresholds for shorter
supply voltage. stimulus durations.
Chip Supply voltage (V) ESOP (pJ) Max. fclk (MHz)
3.3. EDP and Computational Complexity
ODIN 0.55 8.4 75 Figure 6 shows a comparison between the Loihi system and the
MorphIC 0.8 30 55 ODIN + MorphIC system in terms of EDP, number of operations
per classification and a ratio between these two quantities. While
panel (a) reports the same numbers as in Table 1, panels (b)
and (c) allow for a more fair comparison of energy consumption
processing of all neurons in the core, thus potentially leading
between the two neuromorphic systems. From panel (b), we
to a significant amount of dummy operations (Frenkel et al.,
can see how the number of operations is similar for the EMG
2019b). Taking the example of the DVS-MorphIC network with
networks, both being MLPs for the two neuromorphic systems.
a crossbar core of 512 neurons (Figure 3B), each input spike
Differently, the number of operations for the visual input and the
leads to 512 SOPs, of which only 210 are useful for hidden
fusion differ substantially between the two systems due to the use
layer processing. Similarly, each spike from a hidden layer
of a CNN in the Loihi system. Taking this into account, we can
neuron leads to 512 SOPs, of which only five are actually
see in panel (c) that the normalized energy consumption tends to
used for output layer processing. The induced overhead is
be similar for both systems, more than the EDP in panel (a) is.
thus particularly critical for output layer processing, which
degrades both the energy per inference and the inference time4 .
However, this problem is partly mitigated in the Fusion-ODIN 4. DISCUSSIONS
network for output layer processing. Indeed, when resorting to
an external mapping table (section 2.3.1), hidden layer spikes As it has been discussed in Davies (2019), there is a real
can be remapped back to the sensor fusion output layer of need for a benchmark in the neuromorphic engineering field
ODIN with specific single-SOP AER events (Frenkel et al., to compare the metrics of accuracy, energy, and latency. ML
2019a), thus avoiding the dummy SOP overhead and leading to benchmarks, such as ImageNet for image classification (Deng
a lower energy and inference time compared to the standalone et al., 2009), Chime challenges for speech recognition (Barker
EMG-ODIN and DVS-MorphIC networks (Tables 1, 3). As et al., 2015), and the Ninapro dataset containing kinematic
described in section 2.3.1, the fusion results exclude the mapping and surface EMG for prosthetic applications (Atzori et al.,
table overhead. 2014) are not ideal for neuromorphic chips as they require
The comparison of the results obtained with ODIN + high performance computing for processing. For example,
MorphIC to those obtained with its GPU baseline counterpart floating point bit resolution, large amounts of data and large
(Table 1 and Figure 5) leads to conclusions similar to those power consumption. There have been some efforts in creating
already drawn with Loihi in section 3.1, with the difference relevant event-based datasets, such as N-MNIST (Orchard et al.,
that while the GPU system is significantly faster, between 2× 2015), the spiking version of the widespread MNIST digits
and 10× faster, the ODIN + MorphIC neuromorphic system recognition dataset, N-TIDIGITS18 (Anumula et al., 2018),
is between 500× and 3,200× more energy-efficient. Moreover, the spiking version of the spoken digits recognition dataset
it appears from Figure 5 that the EMG-ODIN, DVS-MorphIC from LDC TIDIGITS, and the DVS gesture recognition dataset
and Fusion-ODIN networks basically perform at chance level from IBM (Amir et al., 2017). These datasets are either toy
for a 10-ms stimulus duration. This comes from the fact that examples or are not meant for real-world applications. Here,
the firing thresholds of the networks were selected based on we are introducing a hand gesture benchmark in English sign
a 200-ms stimulus duration, which leads the output neurons language (e.g., ILY) using the DVS and Myo sensors. This kind of
to remain silent and never cross their firing threshold when benchmark can be directly used as a preliminary test for Brain-
insufficient input spike data is provided. This problem could be Machine Interface (BMI)/personalized medicine applications.
We have collected this dataset from 21 people and in this
4 As discussed in (Frenkel et al., 2019b), a simple extension providing post-synaptic
paper have benchmarked it on three digital neuromorphic chips,
start and end addresses would avoid these dummy SOPs and allow for an efficient
measuring the accuracy, energy, and inference time. We believe
processing of fully-connected layers, which will be included in future generations this work takes an important first step in the direction of a
of the chips. real use-case (e.g., rehabilitation, sports applications, and sign
90
80
70
Accuracy [%]
60
50
40
30
FIGURE 5 | Accuracy vs. stimulus duration for the ODIN + MorphIC system and its software baseline counterpart. In blue the results for the MLP (GPU), in red the
results for the spiking MLP (ODIN + MorphIC). No classification is present for APS frames before 25 ms since the frame rate is 20 fps.
Loihi ODIN+MorphIC
10 1 107
A B C
10− 6
EDP / SOPs [uJ * s / spike]
EDP [uJ * s]
SOPs
100
106
10− 1
10− 7
EMG DVS FUS EMG DVS FUS EMG DVS FUS
FIGURE 6 | Comparison between the two neuromorphic system with respect to (A) energy delay product (EDP) (see section 1), (B) number of synaptic operations
(SOPs) (see section 2.3.1), (C) EDP normalized by the number of SOPs.
interpretation) which we would like to encourage the community signals as we mentioned in section 2.2. However, this does not
to use. move away from the real application as we have shown in a live
Although the dataset we provided is on static gestures, the demo in Ceolini et al. (2019a).
DVS and the spiking EMG signals provide the capability for low- The selected multi-sensor data fusion, which combines
power processing using event-based neuromorphic chips and vision and EMG sensors, derives from the need of multiple
enable embedded systems with online on-site processing without sources to help the classification in real-scenario cases. Although
having to send the data to remote sensors. Therefore, this work is the results show a small improvement due to the EMG
an important first step toward edge-computing applications. The sensors, they still provide some classification in case light
static dataset also helps with reducing the noise from the EMG conditions or camera occlusions are not ideal. In addition,
for specific applications, such as neuroprosthetic control, the manually extracted, and the classification is done on the extracted
EMG is integrated in the prosthetic device and, eventually, the features. Note that this pipeline is completely different from the
camera can act as a support input helping during calibration event-based neuromorphic approach which extracts the features
or more advanced tasks, such as sensory-motor closed loop directly from the events. Another important thing to mention
(Jiang et al., 2012). here is that although we have encoded the signals separately,
Since the event-based neuromorphic chips require inputs this sensory information can be directly encoded to events at the
in the form of events, the continuous sensory signals have front-end. This has already been established for audio and visual
to be encoded into spikes for an event-driven processing. sensors (Lichtsteiner et al., 2006; Chan et al., 2007) and there
This quantization loses information (and hence accuracy) in have also recently been design efforts for other signals such the
comparison to the analog information processing in trade-off biomedical ones (Corradi and Indiveri, 2015).
with the low power consumption of event-based systems which To have a reference point for comparison, we trained the
is required for edge computing. To compensate for the loss of same network architecture used for the two neuromorphic
information and accuracy, it is important to merge information setups. As can be seen in Table 1, the baseline accuracy on
from multiple sensors in a sensory fusion setup. In this setting, the fusion is on par with both Loihi and ODIN + MorphIC,
the information loss by quantization from one sensor can be despite the lower bit resolution on the neuromorphic chips in
made up for by another one. This is similar to how humans comparison with the 32-bit floating point resolutions on GPU
and animals perceive their environment through diverse sensory in the baseline approach. We speculate that this is because the
channels: vision, audition, touch, smell, proprioception, etc. SLAYER training model already takes into account the low bit
From a biological perspective, the fundamental reason lies in precision and thus calculates the gradients, respectively. Similar
the concept of degeneracy in neural structures (Edelman, 1987), to that, ODIN and MorphIC take a quantization-aware training
which means that any single function can be carried out by more approach which calculates the weights based on the available
than one configuration of neural signals, so that the biological on-chip precision. As can be seen from all the experiments in
system still functions with the loss of one component. It also Table 1, the classification accuracy using only the EMG sensor
means that sensory systems can educate each other, without an is relatively low. However, it should be noted that this is the
external teacher (Smith and Gasser, 2005). The same principles result of having a model which is trained across subjects and
can be applied for artificial systems, as information about the there are multiple sources of variability across subjects: (i) The
same phenomenon in the environment can be acquired from placement of the EMG sensor is not necessarily in the same
various types of sensors: cameras, microphones, accelerometers, position (with respect to the forearm muscles) for every subject.
etc. Each sensory-information can be considered as a modality. (ii) Every subject performs the gestures in a unique manner. (iii)
Due to the rich characteristics of natural phenomena, it is rare The muscle strength is different for every subject. In addition,
that a single modality provides a complete representation of the since the EMG is directly measured from surface electrodes,
phenomenon of interest (Lahat et al., 2015). it acquires noise while traveling through the skin, background
There are mainly two strategies for multi-modal fusion in noise from electronics, ambient noise, and so forth. In a real-
the literature (Cholet et al., 2019): (1) data-level fusion (early world application, the network model can be trained on a single
fusion) where modalities are concatenated then learned by a subject’s data, yielding much higher accuracy. Moreover, having
unique model, and (2) score-level fusion (late fusion) where the online learning abilities on the neuromorphic chip can aid
modalities are learned by distinct models and only after their in adapting these models to every subject uniquely. Such online
predictions are fused with another model that provides a final learning modules already exist in Loihi as well as in ODIN and
decision. Early fusion, including feature-level fusion, suffers MorphIC, which can be exploited in the future to boost the
from a compatibility problem (Peng et al., 2016) and does not classification accuracy of EMG signals. Furthermore, it becomes
generalize well. Additionally, neural-based early fusion increases apparent that the fusion accuracy is close, if not higher, at
the memory footprint and the computational cost of the process, about 4% to the accuracy achieved with the DVS single sensor.
by inducing a full connectivity at the first classification stages. It However, the importance of the EMG signal is in the wearable
is an important factor to take into consideration when choosing application since it is a natural way to control prosthesis and
a fusion strategy (Castanedo, 2013), especially for embedded it is a direct measure of the activity and movement in the
systems. Therefore, we follow a late fusion approach with a muscles. Given the noisy nature of the EMG signal, it is critical
classifier-level fusion, which has been shown to perform better to combine it with the visual input to boost the accuracy. But
than feature-level fusion for classification tasks (Guo et al., 2014; even given the noisy nature of the signal, it still allows to retrieve
Peng et al., 2016; Biagetti et al., 2018). It is close to score- relevant information which helps boosting the accuracy of
level fusion by combining the penultimate layers of the base the fusion.
(unimodal) classifiers in a meta-level (multimodal) classifier It is worth noting that while the accuracy between the
that uses the natural complementarity of different modalities to spiking MLP on Loihi and ODIN + MorphIC are directly
improve the overall classification accuracy. comparable, the results regarding the spiking CNN on Loihi
In this context, to have a fair comparison, the central and the spiking MLP on ODIN + MorphIC are not. This is
question is the difference between the completely traditional because the two architectures use different features and resources
approaches, such as the CNN and MLP baselines, vs. the event- on their respective neuromorphic systems (as already described
based neuromorphic one. In the baseline, the EMG features are in section 2.3). Based on this, there are different constraints
present in the two chips. Traditionally, a CNN architecture is the context of a particular application, the malleability of
used for image classification which is the network we used on the neuromorphic hardware enables this adaptation to the task-
Loihi chip, given the large number of neurons that are available dependent constraints within a framework of state of the art
(128k) on this general-purpose platform. However, since ODIN results with respect to system performance.
and MorphIC are small-scale devices compared to Loihi, the
number of neurons are a lot more constrained (i.e., 256 neurons DATA AVAILABILITY STATEMENT
for ODIN, 2k for MorphIC). Therefore, we resorted to using
a fully-connected MLP topology instead of a CNN for image The datasets analyzed for this study can be found in the Zenodo,
classification in MorphIC. open access repository, http://doi.org/10.5281/zenodo.3663616.
Regarding the latency, it is important to mention that for All the code used for the reported experiments can be found at
real-world prosthetic applications, the latency budget is below https://github.com/Enny1991/dvs_emg_fusion.
250 ms (Smith et al., 2011). This means that if the processing
happens within this budget, the patient will not feel the lag AUTHOR CONTRIBUTIONS
of the system. Hence, optimizing the system for having lower
latency than 200 ms will not be beneficial as the patient will EC, CF, and SS contributed equally to the work. EC, GT, MP, and
not feel the latency below 200 ms. Therefore, within this ED participated equally to the development of the work idea and
budget, other parameters can be optimized. The neuromorphic collected the dataset. EC and LK were responsible for the baseline
approach is very advantageous in this case since it trades- experiments. CF and SS implemented the ODIN + MorphIC
off power with latency, but it stays within the latency budget and Loihi pipelines, respectively. SS implemented the SLAYER
that is required. Contrarily, the GPU system has an overall framework and adapted it for the specific application. All authors
faster inference time but uses much more energy. It is worth contributed to the writing of the paper.
mentioning that our results are reported in accelerated time,
however, the EMG and DVS are slowly changing signals, and FUNDING
thus, even though the classification is done very fast, the
system has to wait for the inputs to arrive. Therefore, it is This work was supported by the EU’s H2020 MSC-IF grant
as if the system is being run in real-time. Here, there is a NEPSpiNN (Grant No. 753470), the Swiss Forschungskredit
trade-off between the memory that is storing the streaming grants FK-18-103 and FK-l9-106, the Toshiba Corporation,
data for processing and the dynamic energy consumption. the SNSF grant No. 200021_172553, the fonds Européen de
The accelerated time allows for lower energy consumption Développement Régional FEDER, the Wallonia within the
as the system is on for a shorter time, however, this comes Wallonie-2020.EU program, the Plan Marshall, the FRS-FNRS
with the caveat that the input has to be buffered for at least of Belgium, the EU’s H2020 project NEUROTECH (Grant No.
200 ms in off-chip memory, therefore inducing a power and 824103), and the H2020 MC SWITCHBOARD ETN (Grant No.
resource overhead. 674901). The authors declare that this study received funding
The final comparison provided by Figure 6 shows how from Toshiba Corporation. The funder was not involved in
the two systems have a similar energy consumption when this study design, collection, analysis, interpretation of data, the
this is normalized by the number of operations done to writing of this article, or the decision to submit it for publication.
run the network and obtain one classification output. While
ODIN + MorphIC consumes less per classification in absolute ACKNOWLEDGMENTS
terms, when considering the number of operations, it performs
comparably to Loihi. When deploying a neuromorphic system, The authors would like to acknowledge the 2019 Capocaccia and
one has to take into account all these aspects. Meaning not only Telluride Neuromorphic Workshops and all their participants for
is there a trade-off between speed and energy consumption but the fruitful discussions, and Intel Corporation for access to Loihi
there is also one between accuracy and energy consumption, neuromorphic platform. We thank Prof. B. Miramond, Prof. D.
given the fact that a more complex network architecture may Bol, Prof. S. Liu, Prof. T. Delbruck, and Prof. G. Indiveri. Finally,
have more predictive power while having a higher energy we thank Garrick Orchard for supporting us with the use of the
demand. Overall, one has to look for the best trade-off in Loihi platform and the useful comments to the paper.
REFERENCES Atzori, M., Gijsberts, A., Castellini, C., Caputo, B., Hager, A.-G. M., Elsig, S., et al.
(2014). Electromyography data for non-invasive naturally-controlled robotic
Amir, A., Taba, B., Berg, D., Melano, T., McKinstry, J., Nolfo, C. D., et al. (2017). hand prostheses. Sci. Data 1:140053. doi: 10.1038/sdata.2014.53
“A low power, fully event-based gesture recognition system,” in 2017 IEEE Barker, J., Marxer, R., Vincent, E., and Watanabe, S. (2015). “The third ‘chime’
Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, speech separation and recognition challenge: dataset, task and baselines,” in
HI), 7388–7397. doi: 10.1109/CVPR.2017.781 2015 IEEE Workshop on Automatic Speech Recognition and Understanding
Anumula, J., Neil, D., Delbruck, T., and Liu, S.-C. (2018). Feature (ASRU) (Scottsdale, AZ), 504–511. doi: 10.1109/ASRU.2015.7404837
representations for neuromorphic audio spike streams. Front. Neurosci. Benatti, S., Casamassima, F., Milosevic, B., Farella, E., Schönle, P., Fateh,
12:23. doi: 10.3389/fnins.2018.00023 S., et al. (2015). A versatile embedded platform for emg acquisition
and gesture recognition. IEEE Trans. Biomed. Circuits Syst. 9, 620–630. Donati, E., Payvand, M., Risi, N., Krause, R., Burelo, K., Indiveri, G.,
doi: 10.1109/TBCAS.2015.2476555 et al. (2018). Processing EMG signals using reservoir computing on an
Benjamin, B. V., Gao, P., McQuinn, E., Choudhary, S., Chandrasekaran, event-based neuromorphic system. in 2018 IEEE Biomedical Circuits and
A. R., Bussat, J.-M., et al. (2014). Neurogrid: a mixed-analog-digital Systems Conference (BioCAS), pages 1–4. IEEE. doi: 10.1109/BIOCAS.2018.
multichip system for large-scale neural simulations. Proc. IEEE 102, 699–716. 8584674
doi: 10.1109/JPROC.2014.2313565 Donati, E., Payvand, M., Risi, N., Krause, R. B., and Indiveri, G. (2019).
Biagetti, G., Crippa, P., and Falaschetti, L. (2018). Classifier level fusion of Discrimination of EMG signals using a neuromorphic implementation of
accelerometer and semg signals for automatic fitness activity diarization. a spiking neural network. IEEE Trans. Biomed. Circuits Syst. 13, 795–803.
Sensors 18:2850. doi: 10.3390/s18092850 doi: 10.1109/TBCAS.2019.2925454
Brader, J. M., Senn, W., and Fusi, S. (2007). Learning real-world stimuli in a neural Došen, S., Cipriani, C., Kostić, M., Controzzi, M., Carrozza, M. C., and
network with spike-driven synaptic dynamics. Neural Comput. 19, 2881–2912. Popović, D. B. (2010). Cognitive vision system for control of dexterous
doi: 10.1162/neco.2007.19.11.2881 prosthetic hands: experimental evaluation. J. Neuroeng. Rehabil. 7:42.
Braun, S., Neil, D., Anumula, J., Ceolini, E., and Liu, S. (2019). “Attention- doi: 10.1186/1743-0003-7-42
driven multi-sensor selection,” in 2019 International Joint Conference on Neural Droniou, A., Ivaldi, S., and Sigaud, O. (2015). Deep unsupervised network for
Networks (IJCNN) (Budapest), 1–8. doi: 10.1109/IJCNN.2019.8852396 multimodal perception, representation and classification. Robot. Auton. Syst.
Castanedo, F. (2013). A review of data fusion techniques. 71, 83–98. doi: 10.1016/j.robot.2014.11.005
TheScientificWorldJournal 2013:704504. doi: 10.1155/2013/7 Edelman, G. M. (1987). Neural Darwinism: The Theory of Neuronal Group
04504 Selection. New York, NY: Basic Books.
Ceolini, E., Taverni, G., Khacef, L., Payvand, M., and Donati, E. (2019a). Frenkel, C., Lefebvre, M., Legat, J.-D., and Bol, D. (2019a). A 0.086-mm2 12.7-
“Live demostration: sensor fusion using emg and vision for hand gesture pj/sop 64k-synapse 256-neuron online-learning digital spiking neuromorphic
classification in mobile applications,” in 2019 IEEE Biomedical Circuits processor in 28-nm CMOS. IEEE Trans. Biomed. Circuits Syst. 13, 145–158.
and Systems Conference (BioCAS) (Nara), 1. doi: 10.1109/BIOCAS.2019.89 doi: 10.1109/TBCAS.2018.2880425
19163 Frenkel, C., Legat, J.-D., and Bol, D. (2019b). Morphic: a 65-nm 738k-
Ceolini, E., Taverni, G., Khacef, L., Payvand, M., and Donati, E. (2019b). Sensor synapse/mm2 quad-core binary-weight digital neuromorphic processor with
fusion using EMG and vision for hand gesture classification in mobile stochastic spike-driven online learning. IEEE Trans. Biomed. Circuits Syst. 13,
applications. arXiv 1910.11126. doi: 10.1109/BIOCAS.2019.8919210 999–1010. doi: 10.1109/TBCAS.2019.2928793
Chan, V., Liu, S.-C., and van Schaik, A. (2007). Aer ear: A matched silicon cochlea Furber, S. B., Galluppi, F., Temple, S., and Plana, L. A. (2014). The spinnaker
pair with address event representation interface. IEEE Trans. Circuits Syst. I Reg. project. Proc. IEEE 102, 652–665. doi: 10.1109/JPROC.2014.2304638
Pap. 54, 48–59. doi: 10.1109/TCSI.2006.887979 Gerstner, W. (1995). Time structure of the activity in neural network models. Phys.
Chen, C., Yu, Y., Ma, S., Sheng, X., Lin, C., Farina, D., et al. (2020). Rev. E 51, 738–758. doi: 10.1103/PhysRevE.51.738
Hand gesture recognition based on motor unit spike trains decoded from Ghazaei, G., Alameer, A., Degenaar, P., Morgan, G., and Nazarpour, K. (2017).
high-density electromyography. Biomed. Signal Process. Control 55:101637. Deep learning-based artificial vision for grasp classification in myoelectric
doi: 10.1016/j.bspc.2019.101637 hands. J. Neural Eng. 14:036025. doi: 10.1088/1741-2552/aa6802
Cheok, M. J., Omar, Z., and Jaward, M. H. (2019). A review of hand gesture and Gigli, A., Gregori, V., Cognolato, M., Atzori, M., and Gijsberts, A. (2018). “Visual
sign language recognition techniques. Int. J. Mach. Learn. Cybern. 10, 131–153. cues to improve myoelectric control of upper limb prostheses,” in 2018 7th IEEE
doi: 10.1007/s13042-017-0705-5 International Conference on Biomedical Robotics and Biomechatronics (Biorob)
Cholet, S., Paugam-Moisy, H., and Regis, S. (2019). “Bidirectional associative (Enschede: IEEE), 783–788. doi: 10.1109/BIOROB.2018.8487923
memory for multimodal fusion: a depression evaluation case study,” in 2019 Guo, H., Chen, L., Shen, Y., and Chen, G. (2014). “Activity recognition
International Joint Conference on Neural Networks (IJCNN) (Budapest), 1–6. exploiting classifier level fusion of acceleration and physiological signals,”
doi: 10.1109/IJCNN.2019.8852089 in UbiComp 2014–Adjunct Proceedings of the 2014 ACM International Joint
Cicirelli, G., Attolico, C., Guaragnella, C., and D’Orazio, T. (2015). A kinect-based Conference on Pervasive and Ubiquitous Computing (Seattle, WA), 63–66.
gesture recognition approach for a natural human robot interface. Int. J. Adv. doi: 10.1145/2638728.2638777
Robot. Syst. 12:22. doi: 10.5772/59974 Haria, A., Subramanian, A., Asokkumar, N., Poddar, S., and Nayak, J. S. (2017).
Corradi, F., and Indiveri, G. (2015). A neuromorphic event-based neural recording Hand gesture recognition for human computer interaction. Proc. Comput. Sci.
system for smart brain-machine-interfaces. IEEE Trans. Biomed. Circuits Syst. 115, 367–374. doi: 10.1016/j.procs.2017.09.092
9, 699–709. doi: 10.1109/TBCAS.2015.2479256 Hoeks, C., Barentsz, J., Hambrock, T., Yakar, D., Somford, D., Heijmink, S.,
Courbariaux, M., Bengio, Y., and David, J.-P. (2015). “Binaryconnect: training et al. (2011). Prostate cancer: multiparametric MR imaging for detection,
deep neural networks with binary weights during propagations,” in Advances localization, and staging. Radiology 261, 46–66. doi: 10.1148/radiol.11091822
in Neural Information Processing Systems, eds C. Cortes, N. D. Lawrence, D. Horwitz, B., and Poeppel, D. (2002). How can EEG/MEG and fMRI/PET data be
D. Lee, M. Sugiyama, and R. Garnett (Montreal, QC: Curran Associates, Inc.), combined? Hum. Brain Mapp. 17, 1–3. doi: 10.1002/hbm.10057
3123–3131. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., and Bengio, Y. (2016).
Davies, M. (2019). Benchmarks for progress in neuromorphic computing. Nat. “Binarized neural networks,” in Advances in Neural Information Processing
Mach. Intell. 1, 386–388. doi: 10.1038/s42256-019-0097-1 Systems, eds D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett
Davies, M., Srinivasa, N., Lin, T.-H., Chinya, G., Cao, Y., Choday, S. H., et al. (Barcelona: Curran Associates, Inc.), 4107–4115.
(2018). Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., and Bengio, Y.
Micro 38, 82–99. doi: 10.1109/MM.2018.112130359 (2017). Quantized neural networks: training neural networks with low
Debes, C., Merentitis, A., Heremans, R., Hahn, J., Frangiadakis, N., van Kasteren, precision weights and activations. J. Mach. Learn. Res. 18, 6869–6898.
T., et al. (2014). Hyperspectral and LiDAR data fusion: outcome of the 2013 doi: 10.5555/3122009.3242044
grss data fusion contest. IEEE J. Select. Top. Appl. Earth Observ. Rem. Sens. 7, Izhikevich, E. M. (2004). Which model to use for cortical spiking neurons? IEEE
2405–2418. doi: 10.1109/JSTARS.2014.2305441 Trans. Neural Netw. 15, 1063–1070. doi: 10.1109/TNN.2004.832719
Deiss, S. R., Douglas, R. J., and Whatley, A. M. (1999). “A pulse-coded Jiang, N., Dosen, S., Muller, K.-R., and Farina, D. (2012). Myoelectric control of
communications infrastructure for neuromorphic systems,” in Pulsed Neural artificial limbs—is there a need to change focus? IEEE Signal Process. Mag. 29,
Networks, eds W. Maass and C. M. Bishop (Cambridge, MA: MIT Press), 152–150. doi: 10.1109/MSP.2012.2203480
157–178. Lahat, D., Adali, T., and Jutten, C. (2015). Multimodal data fusion: an
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). overview of methods, challenges, and prospects. Proc. IEEE 103, 1449–1477.
“Imagenet: a large-scale hierarchical image database,” in 2009 IEEE Conference doi: 10.1109/JPROC.2015.2460697
on Computer Vision and Pattern Recognition (Miami, FL: IEEE), 248–255. Lee, H.-Y., Hsu, C.-M., Huang, S.-C., Shih, Y.-W., and Luo, C.-
doi: 10.1109/CVPR.2009.5206848 H. (2005). Designing low power of sigma delta modulator for
biomedical application. Biomed. Eng. Appl. Basis Commun. 17, 181–185. processor comprising 256 neurons and 128k synapses. Front. Neurosci. 9:141.
doi: 10.4015/S1016237205000287 doi: 10.3389/fnins.2015.00141
Lichtsteiner, P., Posch, C., and Delbruck, T. (2006). “A 128 × 128 120 dB 30 MW Rivet, B., Wang, W., Naqvi, S. M., and Chambers, J. A. (2014). Audiovisual speech
asynchronous vision sensor that responds to relative intensity change,” in 2006 source separation: an overview of key methodologies. IEEE Signal Process. Mag.
IEEE International Solid State Circuits Conference-Digest of Technical Papers 31, 125–134. doi: 10.1109/MSP.2013.2296173
(San Francisco, CA: IEEE), 2060–2069. doi: 10.1109/ISSCC.2006.1696265 Rueckauer, B., Lungu, I.-A., Hu, Y., Pfeiffer, M., and Liu, S.-C. (2017). Conversion
Liu, H., and Wang, L. (2018). Gesture recognition for human-robot collaboration: of continuous-valued deep networks to efficient event-driven networks
a review. Int. J. Ind. Ergon. 68, 355–367. doi: 10.1016/j.ergon.2017.02.004 for image classification. Front. Neurosci. 11:682. doi: 10.3389/fnins.2017.
Liu, S.-C., Delbruck, T., Indiveri, G., Whatley, A., and Douglas, R. (2014). Event- 00682
Based Neuromorphic Systems. Hoboken, NJ: John Wiley & Sons. Shivappa, S. T., Trivedi, M. M., and Rao, B. D. (2010). Audiovisual information
Loss, J. F., Cantergi, D., Krumholz, F. M., La Torre, M., and Candotti, C. T. (2012). fusion in human–computer interfaces and intelligent environments: a survey.
“Evaluating the electromyographical signal during symmetrical load lifting,” in Proc. IEEE 98, 1692–1715. doi: 10.1109/JPROC.2010.2057231
Applications of EMG in Clinical and Sports Medicine, ed C. Steele (Norderstedt: Shrestha, S. B., and Orchard, G. (2018). “SLAYER: spike layer error reassignment in
Books on Demand), 1. time,” in Advances in Neural Information Processing Systems 31, eds S. Bengio,
Markovic, M., Dosen, S., Cipriani, C., Popovic, D., and Farina, D. (2014). H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett
Stereovision and augmented reality for closed-loop control of grasping in hand (Montreal, QC: Curran Associates, Inc.), 1419–1428.
prostheses. J. Neural Eng. 11:046001. doi: 10.1088/1741-2560/11/4/046001 Smith, L., and Gasser, M. (2005). The development of embodied cognition: six
Markovic, M., Dosen, S., Popovic, D., Graimann, B., and Farina, D. lessons from babies. Artif. Life 11, 13–29. doi: 10.1162/1064546053278973
(2015). Sensor fusion and computer vision for context-aware control Smith, L. H., Hargrove, L. J., Lock, B. A., and Kuiken, T. A. (2011).
of a multi degree-of-freedom prosthesis. J. Neural Eng. 12:066022. Determining the optimal window length for pattern recognition-based
doi: 10.1088/1741-2560/12/6/066022 myoelectric control: balancing the competing effects of classification error
Meier, K. (2015). “A mixed-signal universal neuromorphic computing system,” in and controller delay. IEEE Trans. Neural Syst. Rehabil. Eng. 19, 186–192.
2015 IEEE International Electron Devices Meeting (IEDM) (Washington, DC: doi: 10.1109/TNSRE.2010.2100828
IEEE), 4–6. doi: 10.1109/IEDM.2015.7409627 Tan, A.-H., Subagdja, B., Wang, D., and Meng, L. (2019). Self-organizing neural
Merolla, P. A., Arthur, J. V., Alvarez-Icaza, R., Cassidy, A. S., Sawada, J., networks for universal learning and multimodal memory encoding. Neural
Akopyan, F., et al. (2014). A million spiking-neuron integrated circuit with Netw. 120, 58–73. doi: 10.1016/j.neunet.2019.08.020
a scalable communication network and interface. Science 345, 668–673. Thakur, C. S., Molin, J. L., Cauwenberghs, G., Indiveri, G., Kumar, K., Qiao, N.,
doi: 10.1126/science.1254642 et al. (2018). Large-scale neuromorphic spiking array processors: a quest to
Moons, B., Goetschalckx, K., Van Berckelaer, N., and Verhelst, M. (2017). mimic the brain. Front. Neurosci. 12:891. doi: 10.3389/fnins.2018.00891
“Minimum energy quantized neural networks,” in 2017 51st Asilomar Turk, M. (2014). Multimodal interaction: a review. Pattern Recogn. Lett. 36,
Conference on Signals, Systems, and Computers (Pacific Grove, CA: IEEE), 189–195. doi: 10.1016/j.patrec.2013.07.003
1921–1925. doi: 10.1109/ACSSC.2017.8335699 Yasen, M., and Jusoh, S. (2019). A systematic review on hand gesture
Moradi, S., Qiao, N., Stefanini, F., and Indiveri, G. (2017). A scalable recognition techniques, challenges and applications. PeerJ Comput. Sci. 5:e218.
multicore architecture with heterogeneous memory structures for dynamic doi: 10.7717/peerj-cs.218
neuromorphic asynchronous processors (DYNAPs). IEEE Trans. Biomed. Zahra, O., and Navarro-Alarcon, D. (2019). “A self-organizing network with
Circuits Syst. 12, 106–122. doi: 10.1109/TBCAS.2017.2759700 varying density structure for characterizing sensorimotor transformations in
Neckar, A., Fok, S., Benjamin, B. V., Stewart, T. C., Oza, N. N., Voelker, A. robotic systems,” in Towards Autonomous Robotic Systems, eds K. Althoefer,
R., et al. (2018). Braindrop: a mixed-signal neuromorphic architecture with J. Konstantinova, and K. Zhang (Cham: Springer International Publishing),
a dynamical systems-based programming model. Proc. IEEE 107, 144–164. 167–178. doi: 10.1007/978-3-030-25332-5_15
doi: 10.1109/JPROC.2018.2881432 Zenke, F., and Ganguli, S. (2018). SuperSpike: supervised learning in
Neftci, E., Mostafa, H., and Zenke, F. (2019). Surrogate gradient learning in spiking multilayer spiking neural networks. Neural Comput. 30, 1514–1541.
neural networks. arXiv abs/1901.09948. doi: 10.1162/neco_a_01086
Orchard, G., Jayawant, A., Cohen, G. K., and Thakor, N. (2015). Converting Zhang, Y., Wang, Z., and Du, J. (2019). “Deep fusion: an attention guided
static image datasets to spiking neuromorphic datasets using saccades. Front. factorized bilinear pooling for audio-video emotion recognition,” in 2019
Neurosci. 9:437. doi: 10.3389/fnins.2015.00437 International Joint Conference on Neural Networks (IJCNN) (Budapest), 1–8.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., et al. (2017). doi: 10.1109/IJCNN.2019.8851942
“Automatic differentiation in PyTorch,” in NeurIPS Autodiff Workshop (Long Zhao, D., and Zeng, Y. (2019). “Dynamic fusion of convolutional features
Beach, CA). based on spatial and temporal attention for visual tracking,” in 2019
Peng, L., Chen, L., Wu, X., Guo, H., and Chen, G. (2016). Hierarchical International Joint Conference on Neural Networks (IJCNN) (Budapest), 1–8.
complex activity representation and recognition using topic model doi: 10.1109/IJCNN.2019.8852301
and classifier level fusion. IEEE Trans. Biomed. Eng. 64, 1369–1379.
doi: 10.1109/TBME.2016.2604856 Conflict of Interest: The authors declare that the research was conducted in the
Phinyomark, A., N., Khushaba, R., and Scheme, E. (2018). Feature extraction absence of any commercial or financial relationships that could be construed as a
and selection for myoelectric control based on wearable EMG sensors. Sensors potential conflict of interest.
18:1615. doi: 10.3390/s18051615
Pitti, A., Blanchard, A., Cardinaux, M., and Gaussier, P. (2012). “Gain-field Copyright © 2020 Ceolini, Frenkel, Shrestha, Taverni, Khacef, Payvand and Donati.
modulation mechanism in multimodal networks for spatial perception,” This is an open-access article distributed under the terms of the Creative Commons
in 2012 12th IEEE-RAS International Conference on Humanoid Robots Attribution License (CC BY). The use, distribution or reproduction in other forums
(Humanoids 2012) (Osaka), 297–302. doi: 10.1109/HUMANOIDS.2012. is permitted, provided the original author(s) and the copyright owner(s) are credited
6651535 and that the original publication in this journal is cited, in accordance with accepted
Qiao, N., Mostafa, H., Corradi, F., Osswald, M., Stefanini, F., Sumislawska, academic practice. No use, distribution or reproduction is permitted which does not
D., et al. (2015). A reconfigurable on-line learning spiking neuromorphic comply with these terms.