Channel Equalization For Side Channel Attacks
Channel Equalization For Side Channel Attacks
Channel Equalization For Side Channel Attacks
1 Introduction
Which will maximize the output of the filter at time t = T when the trans-
mitted signal is s(t). Applying the filter to a received signal means convolving
this impulse response with the received signal, and sampling the output of this
convolution at time t = T . If we are attempting to select which of the possi-
ble signals s1 (t), · · · , sN (t) was sent, we would simply perform N convolutions,
each for a candidate sn (t). Selecting the most likely candidate than becomes the
arg max of the convolution over all candidates at t = T , which can also be writ-
ten as the correlation of received signal r(t) with all candidates sn (t) at t = 0,
which would be:
ZT
arg max (r(t) ? sn (t)|t=T ) = arg max y(τ )sn (τ )dτ (5)
n n
0
The forms given in equations (2) force both r(t) and s(t) to be zero-mean
and normalized by standard deviation. This is necessary for us as we do not have
proper scaling of the template s(t) used at the receiver.
One critical difference between communications systems and side-channel
power analysis is the definition of the argument of s(t). In communications we
are sending a known signal sn (t), which may be drawn from a set of ‘allowed’
signals s1 (t), s2 (t), · · · , sN (t). Each of these signals is typically a finite-length
signal as a function for time (or samples in the discrete case). At the receiver
we can use the matched filter to determine which of the N possible signals was
transmitted.
For side-channel analysis, our function s(t) is actually defined over the num-
ber of cryptographic operations we observed. In equation (1) this was the ‘trace
index’ d, and thus will be referred to as s(d). Each of the possible functions
s1 (d), s2 (d), · · · , sN (d) reflects the hypothetical value for the byte of the secret
key we are attacking. Thus the matched filter comparison is always done at the
same sample (i.e. time point) in each power measurement trace td .
h0 h1 h2 hN
S S S t[n]
Fig. 1. Simple channel model, where noise can be added to the output if w[n] also
needs to be modelled.
In equation (4) the received signal is corrupted only with AWGN, and pos-
sibly a fixed scaling factor. This does not account for realistic channels between
the transmitter and receiver, which may include the signal coming from multiple
paths. Instead the model shown in Fig. 1 is used, which is described by equation
(6). The objective of ‘channel estimation’ is to discover the value of the taps,
given by the hchannel vector. Again p is the original transmitted data.
J−1
X
t[n] = hj p[n − j] + w[n] (6)
j=0
Estimating the coefficients at the receiver requires that the transmitter sends
a known ‘training sequence’. In communications systems the channel estimation
has several complicating factors: it must be performed in real-time to be use-
ful, and the channel will change over time so one must track the channel. By
comparison in side-channel analysis the computation must simply be possible in
reasonable time, and the channel varies little over time since the measurement
setup is fixed.
Using a channel model for side-channel analysis means we assume that a single
piece of data generated the entire power trace, via the channel model. If we
use the inverse of this channel model at the receiver, we can thus generated a
single point from each trace, this single point containing all of the relevant infor-
mation from the entire trace for a specific subkey. Thus note that each subkey
s requires a different channel estimate to be formed. Rather than forming the
channel hchannel , we will instead directly estimate the inverse. This inverse will
be the required linear equalizer for our unknown channel. As we will generate
a separate equalizer hs for each subkey being attacked, but it is trivial to also
use a H matrix instead by combining the hs vectors, which would generate
information for all attacked subkeys. This is analogous to Multiple Input Mul-
tiple Output (MIMO) systems, where the channel matrix is used to generate
several independent communications channels. In our example the ‘independent
channels’ means the different information about each subkey s. Extensions to
generate a channel estimate for a single bit within a subkey are of course valid,
and would just require additional channel estimates.
In communications systems a sequence at the transmitter is disrupted by
the physical channel. We will be considering the ‘known sequence’ to instead
be the leaked information, typically the hamming weight or hamming difference
of sensitive data. The channel is considered everything in between the leaked
information and the power measurement: thus we also group countermeasures
into this channel and other details of the implementation.
The model used is given in equation (7). The information being leaked
by the device is pd,s (e.g.: hamming weight or hamming distance) about sub-
key s related to trace d. For example using the classic HW CPA assumption,
pd,s = HW (SubByte(b ⊕ k)), where b is the plaintext byte and k is a key value.
The equalization vector for subkey s is hs , and td is the vector of power mea-
surements.
pd,s = td · hs (8)
The value of µtd will not be known, and instead the estimate µ̂td is formed
from the received data. The value of µp is known, which will simply be 12 of the
maximum hamming weight (e.g. µp = 4 on our 8-bit microcontroller).
The vector hs is the linear equalizer coefficients. Unlike in communications
systems we have no control over the ‘transmitter’ and thus will always use the
form in equation (8). Attempting to solve the form of equation (6) and then
invert the matrix for use in equation (8) would be equivalent, however the form
in (8) simplifies notation and computation in side channel analysis problems.
For notational simplicity this uses the form of (8), if the zero-mean assump-
tion is not made the form in (7) should instead be substituted. Two different
options for minimizing this option are commonly used: the Least Square (LS) and
the Mean Square Error (MSE). Results of experiments using both minimization
options were almost identical, so only the LS will be described here.
For the LS cost function, the objective is to minimize the sum of square errors
over all traces:
D−1
X
e2 (d) (10)
d=0
This can be accomplished with a least-squares (LS) error estimator (or ‘solver’),
with the solution ĥs . These solvers are frequently built into numeric packages
such as MATLAB, SciPy, etc. A faster method is to use the pseudoinverse to
solve the LS problem, which has a known solution given by (11.12) in [9]:
ĥs = t+ · ps (11)
A=U ·Σ·V∗
A+ = V · Σ + · U ∗
D−1
X
iˆs = arg min |p̂d,s − µp̂d,s,i | (12)
i
d=0
In the case where the values of µp̂s,i are equally spaced and σp̂2s,i are all equal
these two approaches (minimum distance and maximum likelihood) will produce
the same results.
5 Implementation Performance
This section briefly mentions some practical considerations of implementing the
algorithms from this paper. Of particular importance is the use of the pseudoin-
verse for the least-squares estimation of the channel.
6 Attack Results
6.1 Unprotected Software AES-128
An unprotected software AES implementation is used as the first example de-
vice. The code is the AVR-Crypto-Lib AES code in C2 , programmed into an
AtMega328p microcontroller. The device runs at 7.3728 MHz, and power mea-
surements are taken from a 50-ohm resistive shunt inserted into the VCC lines.
Measurements are perfectly synchronized with a trigger generated by the device.
Two separate attacks are considered: the first is a profiled attack, which first
solves the equalizer coefficients equation (7) using power measurements taken
with a known plaintext and encryption key, both of which randomly change for
every trace. Once the equalizer is known a different set of traces with a fixed key
and randomly changing plaintext is used to generate attack statistics.
Each trace measurement with the unknown key is multiplied by the esti-
mated ĥs to form a single datapoint, which is then processed by several different
algorithms. The results of this compared to both a standard CPA attack along
with a template attack is given in Fig. 2. Details of the various attacks are given
in the follow subsections.
Correlation Power Analysis (CPA) For this attack the standard CPA is
used[7]. The most likely subkeys are ranked by the correlation coefficient given
by equation (1), where the intermediate value attacked is the Hamming Weight
at the output of the S-Boxes from the first round.
Template Attack For this attack a template[10] of the Hamming Weight (HW)
at the output of the S-Boxes from the first round of AES is generated using 2500
traces with a known (random) encryption key & plaintext. A total of 16 × 9
templates are generated, each template targeting 3 Points of Interest (POI) in the
trace for each subkey. The POI are selected based on the sost criteria as detailed
in [11]. Template matching is done using a multivariate normal distribution, as
given in [10].
Equalized Attacks For this attack the equalizer coefficients are generated
from 2500 traces with a known (random) encryption key & plaintext. An attack
is then performed on a different set of traces from the same setup, having passed
those traces through a linear estimator ĥs . The output of the linear estimator,
a 1 × 1 random variable, is used in several different attacks.
2
Available from: http://avrcryptolib.das-labor.org/trac/wiki/AES
The most basic attack uses that datapoint as the input to a CPA attack,
which is exactly the same as the CPA attack used in Section 6.1. As in [4], this
type of preprocessing should be more resilient to noise, since it’s not dependant
on specific templates (i.e. value of mean).
The CPA attack does not use all available information — we can use the
existing training set to generate templates based on the output of the equalizer.
Two versions as described in Section 3.4 are tested — one using a univariate
normal distribution (‘Equalizer Template’) to generate probabilities against each
candidate HW template, and one using a simple absolute difference between
candidate HW template means.
The linear equalizer used here was found using the LS solver based on the
pseudoinverse. Equalizers were also built with a regressive LS solver and the
linear MSE solutions given in this paper – the resulting PGE was almost identical
to that given in Fig. 2. To avoid cluttering the graph these have not been shown.
Channel Estimation Analysis The final attack graphed does not have a
profiling phase, this is the Channel Estimation Analysis (CEA). The specified
number of traces for each datapoint are split into half; one part becoming the
fitting set, one part becoming the test set, as described in Section 4. The CEA
appears to have poor performance, in part due to the partitioning of traces. The
CEA achieves an average PGE < 10 in about 300 traces, although this is off-
scale for the graph. The advantage of CEA will be demonstrated when instead
attacking a masking scheme.
125
100
CPA
75 Template (3 POI)
Equalized CPA
Equalized Minimum Distance
50
Equalized Template
Channel Estimation Analysis
25
10
0
5 10 15 20 25 30 35 40 45 50
Number of Traces Used in Attack
80
60
40
20
0
0 500 1000 1500 2000 2500
Trace Number
Fig. 3. CPA performed on AES-256 protected with RSM, where traces have been pre-
processed by Channel Equalization. Three different training key setups are compared:
a randomly varying key, a fixed key that is the same as the secret key (i.e. as someone
testing the system), and a fixed key that differs from the secret key (i.e. as an attacker
that cannot control the key).
are generate for the Hamming Weight (HW) of this value (i.e. the same leakage
assumption being used for the channel estimation attack and CPA attack). These
results are shown in Fig. 4, note after 10 000 traces no changes in the average
PGE are found.
Note that it is possible for first-order attacks to succeed, as given in [12]. This
required changes to the leakage model, and it was also noted that a standard
HW leakage model of the S-Box input or output failed to recover the secret key.
Next, we will consider the CEA attack. For this attack no prior knowledge is
assumed beyond the assumption about the device leaking the hamming weight,
i.e., the same assumption made in the unprotected case. The specified number
of traces are again split into two groups: a training set and a test set. The results
of this attack are shown in Fig. 4. Whilst a complete attack was unsuccessful,
the PGE of subkeys is considerably better compared to the other attacks, and a
progression of the PGE towards zero is present.
7 Future Work
7.1 Classifying Countermeasure Effectiveness
The use of equalization can also be used to quantify the effectiveness of counter-
measures. This is particularly useful for the simulated environment — it was pre-
viously reported for example how a physical measurement setup which ‘blended’
several measurements together resulting in an attack on a physical device be-
ing far more successful than the simulation predicted [1]. If instead the channel
estimation and equalization process is used for the simulated environment, the
equalized attack algorithm will combine all linearly related leakage points.
Attacks on AES−256RSM
100
Average PGE 75
50 CEA
Template (First−Order, 5 POI)
CPA (First−Order)
25
0
4000 5000 6000 7000 8000 9000 10000
Trace Number
Note that whist the channel equalizer is linear, the channel itself may be
non-linear. Thus the real system may be able to combine non-linear effects to
break the countermeasures, even though in the simulated channel the system is
perfectly secure. Work in comparing simulated to physical channels is required to
understand these effects. Using non-linear channel models may also improve the
performance, but solving the LS or MSE cost functions for non-linear channels
is more complex.
8 Conclusions
Using channel equalization is a simple method of compensating for all disrup-
tions to the leaked data of a device. With proper selection of the channel, even
intended disruptions such as countermeasures can be compensated for. This work
has used a simple linear FIR equalizer, where the equalizer is found using least
squares (LS) or Mean Square Error (MSE) metrics. The improvement in attack
performance for unprotected AES 128 is demonstrated, along with proving the
ability of channel estimation to attack protected implementations. While chan-
nel equalization requires a profiling phase, some initial work using the channel
estimation without the profiling phase was also demonstrated, under the name
of the Channel Estimation Attack (CEA).
The advantage of both the equalizer with profiling and the CEA is they
require minimal assumptions about the device being attacked. In the case of
CEA the attack requires no more information than a CPA attack, for example
being unaware of any countermeasures inserted into the device. Compared to
typical multivariate or higher-order DPA attacks, this is a considerable reduction
in attack complexity. It was demonstrated how the channel equalization & CEA
could both be used to break a specific protected AES implementation.
References
1. Moradi, A., Mischke, O.: On the Simplicity of Converting Leakages from Multi-
variate to Univariate. In Bertoni, G., Coron, J.S., eds.: Cryptographic Hardware
and Embedded Systems - CHES 2013. Volume 8086 of Lecture Notes in Computer
Science. Springer Berlin Heidelberg (2013) 1–20
2. Nassar, M., Souissi, Y., Guilley, S., Danger, J.L.: Rsm: A small and fast counter-
measure for aes, secure against 1st and 2nd-order zero-offset scas. In Rosenstiel,
W., Thiele, L., eds.: DATE, IEEE (2012) 1173–1178
3. Schindler, W., Lemke, K., Paar, C.: A Stochastic Model for Differential Side
Channel Cryptanalysis. In Rao, J., Sunar, B., eds.: Cryptographic Hardware and
Embedded Systems - CHES 2005. Volume 3659 of Lecture Notes in Computer
Science. Springer Berlin Heidelberg (2005) 30–46
4. Oswald, D., Paar, C.: Improving Side-Channel Analysis with Optimal Linear
Transforms. In Mangard, S., ed.: Smart Card Research and Advanced Applications.
Volume 7771 of Lecture Notes in Computer Science. Springer Berlin Heidelberg
(2013) 219–233
5. Hajra, S., Mukhopadhyayd, D.: On the Optimal Pre-processing for Non-Profiling
Differential Power Analysis. In: International Workshop on Constructive Side-
Channel Analysis and Secure Design (COSADE). (2013)
6. Mehari Msgna, K.M., Mayes, K.: Verifying Software Integrity in Embedded Sys-
tems: A Side Channel Approach. In: International Workshop on Constructive
Side-Channel Analysis and Secure Design (COSADE). (2013)
7. Brier, E., Clavier, C., Olivier, F.: Correlation power analysis with a leakage model.
Cryptographic Hardware and Embedded Systems - CHES 2004 (2004) 135–152
8. North, D.: An analysis of the factors which determine signal/noise discrimination
in pulsed-carrier systems. RCA Labs. (1943)
9. Trefethen, L.N., Bau III, D.: Numerical linear algebra. Number 50. Siam (1997)
10. Chari, S., Rao, J., Rohatgi, P.: Template attacks. In Kaliski, B., Koç, e., Paar, C.,
eds.: Cryptographic Hardware and Embedded Systems - CHES 2002. Volume 2523
of Lecture Notes in Computer Science. Springer Berlin Heidelberg (2003) 13–28
11. Gierlichs, B., Lemke-Rust, K., Paar, C.: Templates vs. stochastic methods. In
Goubin, L., Matsui, M., eds.: Cryptographic Hardware and Embedded Systems -
CHES 2006. Volume 4249 of Lecture Notes in Computer Science. Springer Berlin
Heidelberg (2006) 15–29
12. Moradi, A., Guilley, S., Heuser, A.: Detecting hidden leakages. In Boureanu, I.,
Owesarski, P., Vaudenay, S., eds.: Applied Cryptography and Network Security.
Volume 8479 of Lecture Notes in Computer Science. Springer International Pub-
lishing (2014) 324–342