Abstract
The maximum likelihood side-channel distinguisher of a template attack scenario is expanded into lower degree attacks according to the increasing powers of the signal-to-noise ratio (SNR). By exploiting this decomposition we show that it is possible to build highly multivariate attacks which remain efficient when the likelihood cannot be computed in practice due to its computational complexity. The shuffled table recomputation is used as an illustration to derive a new attack which outperforms the ones presented by Bruneau et al. at CHES 2015, and so across the full range of SNRs. This attack combines two attack degrees and is able to exploit high dimensional leakage which explains its efficiency.
Annelie Heuser is a Google European Fellow in the field of Privacy and is partially founded by this fellowship.
Y. Teglia—Parts of this work have been done while the author was at STMicroelectronics.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In order to protect embedded systems against side-channel attacks, countermeasures need to be implemented. Masking and shuffling are the most investigated solutions for this purpose [18]. Intuitively, masking aims at increasing the order of the statistical moments (in the leakage distributions) that reveal sensitive information [8, 15], while shuffling aims at increasing the noise in the adversary’s measurements [14]. As a result, an important challenge is to develop sound tools to understand the security of these countermeasures and their combination [31]. For this purpose, the usual strategy is to consider template attacks for which one can split the evaluation goals into two parts: offline profiling (building an accurate leakage model) and online attack (recovering the key using the leakage model). As far as profiling is concerned, standard methods range from non-parametric ones (e.g., based on histograms or kernels) of which the cost quite highly suffers from the curse of dimensionality (see e.g., [2] for an application of these methods in the context of non-profiled attacks) to parametric methods, typically exploiting the mixture nature of shuffled and masked leakage distributions [16, 17, 25, 27, 33], which is significantly easier if the masks (and permutations) are known during the profiling phase. Our premise in this paper is that an adversary is able to obtain such a mixture model via one of these means, and therefore we question its efficient exploitation during the online attack phase.
In this context, a starting observation is that the time complexity of template attacks exploiting mixture models increases exponentially with the number of masks (when masking) and permutation length (when shuffling [37]). So typically, the time complexity of an optimal template attack exploiting Q traces against an implementation where each n-bit sensitive value is split into \(\varOmega \) shares and shuffled over \(\varPi \) different positions is in \(\mathcal {O}\left( Q\cdot (2^n)^{\varOmega -1}\cdot \varPi !\right) \), which rapidly turns out to be intractable. In order to mitigate the impact of this high complexity, we propose a small, well-controlled and principled relaxation of the optimal distinguisher, based on its Taylor expansion (already mentioned in the field of side-channel analysis in [6, 11]) of degree L. Such a simplification leads to various concrete advantages. First, when applied to masked implementations, it allows us to perform the (mixture) computations corresponding to the \((2^n)^{\varOmega }\) factor in the complexity formula only once (thanks to precomputation) rather than Q times. Second, when applied to shuffled implementations, it allows us to replace the \(\varPi !\) factor in this formula by \({\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , L \right) } = {\varPi \atopwithdelims ()L}\), thanks to the bounded degree L.
Additionally it can be noticed that an attacker will only build, during the offline profiling, the leakage models needed for the attack. By applying the Taylor expansion of the optimal distinguisher the complexity of the offline profiling is significantly reduced. In general the complexity of the offline profiling becomes equivalent to the complexity of the online attack.
The resulting “rounded template attacks” additionally carry simple intuitions regarding the minimum degree of the Taylor expansion needed for the attacks to succeed. Namely, this degree L needs to be at least equal to the security order O of the target implementation, defined as the smallest statistical moment in the leakage distributions that are key-dependent.
We then show that these attacks only marginally increase the data complexity (for a given success rate) when applied against a masked (only) implementation. More importantly, we finally exhibit that rounded template attacks are especially interesting in the context of high-dimensional higher-order side-channel attacks, and put forward the significant improvement of the attacks against the masked implementations with shuffled table recomputations from CHES 2015 [7].
Introduction to Shuffled Table Recomputation. Masking the linear parts of a block cipher is straightforward whereas protecting the non-linear parts is less obvious. To solve this issue different methods have been proposed. One can cite algebraic methods [3, 30], using Global Look-Up Table (GLUT) [28] and table recomputation [1, 8, 10, 19]. Table recomputation methods are often used in practice as they represent a good tradeoff between memory consumption and execution time since they precompute a masked substitution box (S-Box) that is stored in a table.
However, some attacks still manage to recover the mask during the table recomputation [6, 36]. As a further protection the recomputation can be shuffled. This protection uses a random permutation which is drawn over \(S_{2^n}\), the set of all the permutation of \(\mathbb F^n_2\). Therefore, some random masks are uniformly drawn over \(\mathbb F^n_2\) to ensure the security against first-order attacks.
Contributions. We show that the expansion of the likelihood allows attacks with a very high computational efficiency, while remaining very effective from a key recovery standpoint. This means that the expanded distinguisher requires only little more traces to reach a given success rate, while being much faster to compute.
We also show how to grasp in a multivariate setting several leakages of different orders. In particular, we present an attack on shuffled table recomputation which succeeds with less traces than [7]. Notice that the likelihood attack cannot be evaluated in this setting because it is computationally impossible to average over both the mask and the shuffle (the sole number of shuffles is \(2^n! \approx 2^{1684}\) with \(n=8\)).
Finally, we show that are our rounded version of the maximum likelihood allows better attacks than the state-of-the-art. Namely, our attack is better than the classical \({\text {2O-CPA}}\) and the recent attack of CHES’15 [7] in all noise variance settings.
Outline. The remainder of the paper is organized as follows. Section 2 provides the necessary notations and mathematical definitions. The theoretical foundation of our method is presented in Sect. 3. The case-study (shuffled table recomputation) is shown in Sect. 4. Section 5 evaluates the complexity of our method. The performance results are presented in Sect. 6. Conclusions and perspectives are presented in Sect. 7. Some technical results are deferred to the appendices.
2 Notations
2.1 Parameters
Randomization countermeasures consist in masking and shuffling protections. When evaluating randomized implementations, there are a number of important parameters to consider. First, the number of shares and the shuffle length in the scheme, next denoted as \(\varOmega \) and \(\varPi \), are algorithmic properties of the countermeasure. These numbers generally influence the tradeoff between the implementation overheads and the security of the countermeasures. Second, the order of the implementation protected by a randomization countermeasure, next denoted as O, which is a statistical property of the implementation. It corresponds to the smallest key-dependent statistical moment in the leakage distributions. When only masking is applied and the masked implementation is “perfect” (meaning that the leakage of each share is independent of each other), the order O equals to \(\varOmega \) at best. Finally, the number of dimensions (or dimensionality) used in the traces, next denoted as D, is a property of the adversary. In this respect, adversaries may sometimes be interested by using the lowest possible D (since it makes the detection of POIs in the traces easier). But from the measurement complexity point of view, they have a natural incentive to use D as large as possible. A larger dimension D allows to increase the signal to noise ratio [5].
In summary, our notations are:
-
\(\varOmega \): number of shares in the masking countermeasure,
-
\(\varPi \): length of the shuffling countermeasure,
-
O: order of the implementation,
-
D: dimensionality of the leakages.
Examples. Existing masking schemes combine these four values in a variety of manners. For example, in a perfect hardware masked implementation case with three shares, we may have \(\varOmega =3\), \(O=3\) and \(D=1\) (since the three shares are manipulated in parallel). If this implementation is not perfect, we may observe lower order leakages (e.g. \(\varOmega =3\), \(O=1\) and \(D=1\), that is a first-order leakage). And in order to prevent such imperfections, one may use a Threshold Implementation [24], in which case one share will be used to prevent glitches (so \(\varOmega =3\), \(O=2\) and \(D=1\)). If we move to the software case, we may then have more informative dimensions, e.g. \(\varOmega =3\), \(O=3\), \(D=3\) if the adversary looks for a single triple of informative POIs. But we can also have a number of dimensions significantly higher than the order (which usually corresponds to stronger attacks). Let us also give an example of S-boxes masking with one mask, where the masking process of the S-box (often called recomputation) is shuffled. A permutation \(\varPhi \) of \(\varPi =2^n\) values is applied while computing the masked table. If the attacker ignores the recomputation step, he can carry out an attack on the already computed table. Hence parameters \(\varOmega =2\), \(O=2\), \(D=2\) (also known as “second-order bivariate CPA”). But the attacker can also exploit the shuffled recomputation of the S-box in addition to a table look-up, as presented in [7]; the setting is thus highly multivariate: \(\varOmega =2\), \(\varPi =2^n\), \(O=2\), \(D=2\cdot 2^n+1\). Interestingly, the paper [7] shows an attack at degree \(L=3\) which succeeds in less traces than attacks at minimal degree \(L=O=2\).
In general, a template attack based on mixture distributions (often used in parametric estimation) would require a summation over all random values of the countermeasure, that is \(\mathcal {R}\), which consists in the set of masks and permutations. One can represent \(\mathcal {R}\) as the Cartesian product of the set of mask and the set of permutations. Let us denote by \(\mathcal {M}\) the set of mask and \(\mathcal {S}\) the set of permutations. Then \(\mathcal {R}= \mathcal {M} \times \mathcal {S}\). Therefore, the cardinality of \(\mathcal {R}\) is \(2^{n(\varOmega -1)} \varPi !\).
Eventually, the security of a masked implementation depends on its order and noise level. More precisely, the security increases exponentially with the order (with the noise as basis) [12]. So for the designer, there is always an incentive to increase the noise and order. And for adversary, there is generally an incentive to use the largest possible D (given the time constraints of his attack), so that he decreases the noise.
2.2 Model
We characterize the protection level in terms of the most powerful attacker, namely an attacker who knows everything about the design, except the masks and the noise. This means that we consider the case where the templates are known. How the attacker got the templates is related with security by obscurity, somehow he will know the model. Of course depending on the learning phase these estimations can be more or less accurate. For the sake of simplicity we assume in this paper the better scenario where all the estimations are exactFootnote 1.
Besides, we assume that the noise is independently distributed over each dimension. This is the least favorable situation for the attacker (as there is in this case the most noise entropy). For the sake of simplicity, we assume that the noise variance is equal to \(\sigma ^2\) at each point \(d=1,2,\ldots ,D\). This allows for a simple theoretical analysis. Let us give an index \(q=1,2,\ldots , Q\) to each trace. For one trace q, the model is written as:
where for notational convenience the dependency in q and d has been dropped. Here X is a leakage measurement; \(y=y(t,k^*,R)\) is the deterministic part of the model that depends on the correct key \(k^*\), some known text (plaintext or ciphertext) t, and the unknown random values (masks and permutations) R. Each sample (of index d) of N is a random noise, which follows a Gaussian distribution \(p_N(z) = \frac{1}{\sqrt{2\pi \sigma ^2}} \exp \left( - \frac{z^2}{2\sigma ^2} \right) \).
Uppercase letters are generally used for random variables and the corresponding lowercase letters for their realizations. Bold symbols are used to denote vectors that have length Q, the number of measurements. Namely, \(\mathbf {X}\) denotes a set of Q random variables i.i.d. with the same law as X. So, \(\mathbf {X}\) is a \(Q\times D\) matrix; \(\mathbf {R}\) denotes a set of random variables i.i.d. with the same law as R; \(\mathbf {t}\) denotes the set of input texts of the measurements \(\mathbf {X}\); \(y(\mathbf {t}, k, \mathbf {R})\) denotes the set of leakage models, where k is a key guess, \(k^*\) being the correct key value.
Notations \(\mathbf {X}_d\) and \(\mathbf {X}^{\left( q \right) }\) are used to denote the d-th column and the q-th line of the matrix \(\mathbf {X}\), respectively.
We are interested in attacks where each intermediate data is a n-bit vector. In particular, we target S-boxes, denoted by S. Regarding the transduction from the intermediate variable to the real-valued leakage, we take the example of the Hamming weight \(w_H\) defined by \(w_H(z)=\sum _{i=1}^n z_i\) where \(z_i\) is the ith bit of z.
3 A Generic Log-Likelihood for Masked Implementations
In this section we derive a rounded version of Template Attack. Namely we expand a particular instantiation of the template attack the so-called optimal distinguisher using its Taylor Expansion. By rounding this expansion at the Lth degree we are able to build a rounded version of the optimal distinguisher (later defined as \({\text {ROPT}}_{L}\)). This attack features two advantages: it allows to combine different statistical moments and its complexity becomes manageable.
3.1 Maximum Likelihood (ML) Attack
The most powerful adversary knows exactly the leakage model (but the actual key, the masks, and the noise are unknown during the online step) and computes a likelihood. In the case of masking the optimal distinguisher which maximize the success rate is given by [6]:
Theorem 1 (Maximum Likelihood)
When the \(y\left( t,k,R \right) \) are known and the Gaussian noise N is i.i.d. across the queries (measurements) and independent across the dimension, then the optimal distinguisher is:
where the expectation operator \(\mathbb {E}\) is applied with respect to the random variable \(R\in \mathcal {R}\), and the norm is the Euclidean norm \(\Vert x^{(q)}-y(t^{(q)},k,R)\Vert ^2 = \sum _{d=1}^D (x^{(q)}_d-y_d(t^{(q)},k,R))^2\).
Proof
It is proven in [6] that the Maximum Likelihood distinguisher is:
Applying (1) for Gaussian noise and taking the logarithm yields (2). \(\square \)
In the sequel, we denote by \(LL^{(q)} = \log \mathbb {E}_R \exp \frac{-\Vert x^{(q)}-y(t^{(q)},k,R)\Vert ^2 }{2\sigma ^2}\) the contribution of one trace q of the Log-Likelihood full distinguisher \(LL=\sum _{q=1}^Q LL^{(q)}\).
Remark 1
Notice that for each trace q, the Maximum Likelihood distinguisher involves a summation over \(\#\mathcal {R}\) values, which correspond to \(\#\mathcal {R}\) accesses to precharacterized templates.
If \(D=1\), then the signal-to-noise ratio (SNR) is defined in a natural way as the ratio between the variance of the model Y and the variance of the noise N. But when the setup is multivariate, it is more difficult to quantify a notion of SNR. For this reason, we use the following quantity
which is actually proportional to an SNR, in lieu of SNR. In practice, we assume that \(\gamma \) is small. It is indeed a condition for masking schemes to be efficient (see for instance [12]).
Proposition 1 (Taylor Expansion of Optimal Attacks in Gaussian Noise)
The attack consists in maximizing the sum over all traces \(q=1,\ldots ,Q\) of
where \(\kappa _\ell \) is the \(\ell \)th-order cumulant of the random variable \(\Vert x-y(t,k,R)\Vert ^2\), which can be found inductively from \(\ell \)th-order moments:
using the relation:
Proof
The log-likelihood can be expanded according to the increasing powers of the SNR as:
where we have recognized the cumulant generating function [34]. The above relation (6) between cumulants and moments is well known [39]. \(\square \)
Definition 1
The Taylor expansion of the log-likelihood truncated to the Lth degree \(\mathrm {LL}_L\) in SNR is
Put differently, we have \(\mathrm {LL}=\mathrm {LL}_L + o(\gamma ^L)\) (using the Landau notation). The optimal attack can now be “rounded” in the following way:
Definition 2
(Rounded OPTimal Attack of Degree L in \(\gamma \) ). The rounded optimal Lth-degree attack consists in maximizing over the key hypothesis the sum over all traces of the Lth order Taylor expansion \(\mathrm {LL}_L\) in the SNR of the log-likelihood :
Proposition 2
If the degree L is smaller than the order O of the countermeasure then the attack fails to distinguish the correct key.
Proof
One can notice that \(\mu _{\ell }\) combines (by a product) a most \(\ell \) terms following the formula:
with \(k_1+\ldots +k_d=\ell \). It implies that it exits at most \(\ell \) different \(k_i > 0\) and as a consequence there are at most \(\ell \) different variables in the expectation. Therefore by definition of a perfect masking scheme \(\mu _L\) does not depend on the key. As a consequence \(\mathrm {LL}_{L}\) with \(L < O\) neither depends on the key. \(\square \)
Theorem 2
Let an implementation be secure at order O. The lowest-degree successful attack is the one at degree \(L=O\) which maximizes \(\mathrm {LL}_{L}\). This is equivalent to summing
over all traces and
-
maximize the result over the key hypotheses, if L is even;
-
minimize the result over the key hypotheses, if L is odd.
Proof
Since \(\kappa _\ell \) is independent of k for all \(\ell \le L\), the first sensitive contribution to the log-likelihood is
Now, \(\kappa _{L} = \mu _{L} +\) lower order terms (which do not depend on the key as the implementation is secure at order O), and removing constants independent of k the contribution to the log-likelihood reduces to \((-1)^{L} \mu _{L}\). \(\square \)
Theorem 3
(Mixed Degree Attack). Assuming an implementation secure at order O, the next degree successful attack is the one at degree \(L+1=O+1\) which maximizes \(\mathrm {LL}_{L+1}\). This is equivalent to summing
over all traces and
-
maximize the result over the key hypotheses, if L is even;
-
minimize the result over the key hypotheses, if L is odd.
Proof
The \((L+1)\)th-order term in the log-likelihood becomes
Now from (6) we have, for \(L>0\)
Removing terms that do not depend on k, we obtain:
Compared to a Lth-degree attack, we see that \(\mu _{L}\) is replaced by a corrected version:
where \(\mu _1\) is independent of k. However, \(\mu _1\) cannot be removed as it scales the relative contribution of \(\mu _{L}\) and \(\mu _{L+1}\) in the distinguisher. \(\square \)
Remark 2
In contrast to \(\mathrm {LL}_{L}\), implementing \(\mathrm {LL}_{L+1}\) requires knowledge of the SNR parameter \(\gamma =1/2\sigma ^2\).
Remark 3
In general, when \(L \ge O\) the rounded optimal attack \({\text {ROPT}}_{L}\) exploits all key dependent terms of degree \(\ell \), where \(O\le \ell \le L\), whereas an LO-CPA [8] or MCP-DPA [22] only exploit the term of degree L.
4 Case Study: Shuffled Table Recomputation
In this section we apply the \({\text {ROPT}}_{L}\) formula of Eq. (9) of Definition 2 to the particular case of a block cipher with a shuffled table recomputation stage. We show that in this scenario our new method allows to build a better attack than that from the state-of-the-art. By combining the second and the third cumulants we construct an attack which is better than:
-
any second-order attack;
-
the attack presented at CHES 2015. Following the notations of [7] we denote this attack by \({\text {MVA}}_{TR}\) (which stands for Multi-Variate Attack on Table Recomputation) in the rest of this article. This is a third-order attack that achieves better results than \({\text {2O-CPA}}\) when the noise level \(\sigma \) is below a given threshold (namely \(\sigma ^2 \le 2^{n-2} - n/2\)).
4.1 Parameters of the Randomization Countermeasure
In order to validate our results we take as example a first order (\(O=2\)), masking scheme where the sensitive variables are split into two shares (\(\varOmega =2\)). The nonlinear part of this scheme is computed using a table recomputation stage. This step is shuffled (\(\varPi =2^n\)) for protection against some known attacks [26, 36]. The beginning of this combined countermeasure is given in Algorithm 1. The table is recomputed in a random order from line 3 to line 7.
We used lower case letter (e.g., m, \(\varphi \)) for the realizations of random variables, written upper-case (e.g., M, \(\varPhi \)). For the sake of simplicity in the rest of this case study, we assume that \(m = m'\).
An overview of the leakages over time is given in Fig. 1.
We detail below the mathematical expression of these leakages. The randomization consists in one mask M chosen randomly in \(\{0,1\}^n\), and one shuffle (random permutation of \(\{0,1\}^n\)) denoted by \(\varPhi \). Thus, we denote \(R=(M,\varPhi )\), which is uniformly distributed over the Cartesian product \(\{0,1\}^n \times S_{2^n}\) (i.e. \(\mathcal {M} =\{0,1\}^n\) and \(\mathcal {S}=S_{2^n}\)), where \(S_{m}\) is the symmetric group of m elements. We have \(D=2^{n+1}+2\) leakage models, namely:
-
\(X_0=y_0\left( t,k,R\right) +N_0 \) with \(y_0\left( t,k,R\right) = w_H(M)\),
-
\(X_1=y_1\left( t,k,R\right) +N_1\) with \(y_1\left( t,k,R\right) = w_H(S[T\oplus k]\oplus M)\),
-
\(X_i=y_i\left( t,k,R\right) +N_i\), for \(i=2,\ldots ,2^n+1\) with \(y_i\left( t,k,R\right) = w_H(\varPhi (i-2)\oplus M)\),
-
\(X_j=y_j\left( t,k,R\right) +N_j\), for \(j=2^n+2,\ldots ,2^{n+1}+1\) with \(y_j\left( t,k,R\right) = w_H(\varPhi (j-2^n-2))\).
We recall that we assume the noises N are i.i.d. Clearly, there is a second-order leakage, as the pair \((X_0,X_1)\) does depend on the key. But there is also a large multiplicity of third-order leakages, such that \((X_1,X_i,X_{j=i+2^n})\), as will be analyzed in this case-study.
The following side-channel attacks are applied on a set of Q realizations. Let us define I and J as \(I=\llbracket 2, 2^n + 1 \rrbracket \) and \(J = \llbracket 2^n+2, 2 \times 2^n + 1 \rrbracket \). Then the maximal dimensionality is \(D=2+2\times 2^n\), and we denote a sample d as \(d\in \{0,1\}\cup I\cup J\). The Q leaks (resp. models) at sample d are denoted as \(\mathbf {x}_d\) and \(\mathbf {y}_d = y_d(\mathbf {t},k,R)\).
In order to simplify the notations we introduce
with \(d \in \left\{ 0,1 \right\} \cup I \cup J\). The \(^{\left( q \right) } \) can be omitted where there is no ambiguity.
4.2 Second-Order Attacks
As any other high order masking scheme, our example can be defeated by High Order Attacks [8, 20, 29, 38]. As our scheme is a first order masking scheme with two shares it can be defeated using a second order attack [8, 20] which combines the leakages of the two shares using a combination function [8, 20, 25] such as the second order CPA (\({\text {2O-CPA}}\)) with the centered product as combination function.
Using our notation it implies \(D=2\).
Definition 3
( \({\text {2O-CPA}}\) [29]). We denote by \({\text {2O-CPA}}\) the \(\mathsf {CPA}\) using the centered product as combination function. Namely:
where \(\mathbf {y} =\mathbb {E}_M \left( y_0\left( \mathbf {t}, k, R \right) \circ y_1\left( \mathbf {t}, k, R \right) \right) \), \(\circ \) is the element wise product and \(\widehat{\rho }\) is an estimator of the Pearson coefficient. It can be noticed that as the terms \( y_0\left( \mathbf {t}, k, R \right) \) and \( y_1\left( \mathbf {t}, k, R \right) \) only depend on M the expectation is only computed over \(\mathcal {M}\).
Remark 4
Here we have assumed without loss of generality that the leakages and the model are centered.
An attacker can restrict himself in order to ignore the recomputation stage. Since such attacker ignores the table recomputation no random shuffle is involved. As a consequence the optimal distinguisher restricted to these leakages becomes computable. Nevertheless as we will see in Sect. 6 this approach is not the best. Indeed a lot of exploitable information is lost by not taking into account the table recomputation.
Definition 4
( \({\text {OPT}}_{{\text {2O}}}\) Distinguisher — Eq. (2) for \(D=2\) ). We define by \({\text {OPT}}_{{\text {2O}}}\) the optimal attack which targets the mask and the masked sensitive value.
with \(f_d^{\left( q \right) }\) as defined in Eq. (10).
4.3 Exploiting the Shuffled Table Recomputation Stage
It is known that the table recomputation step can be exploited to build better attacks than second order attacks [6, 36]. Recently a new attack has been presented which remains better than the \({\text {2O-CPA}}\) even when the recomputation step is protected [7]. Let us recall the definition of this attack:
Definition 5
( \({\text {MVA}}_{TR}\) [7]). The MultiVariate Attack (MVA) exploiting the leakage of the table recomputation (TR) is given by the function:
where, like for Definition 3, \(\mathbf {y} =\mathbb {E}_M \left( y_0\left( \mathbf {t}, k, R \right) \circ y_1\left( \mathbf {t}, k, R \right) \right) \), \(\circ \) is the element wise product and \(\widehat{\rho }\) is an estimator of the Pearson coefficient.
Let us now apply our new \({\text {ROPT}}_{L}\) on a block cipher protected with a shuffled table recomputation. In this case the lower moments are given by:
Proposition 3
The second degree rounded optimal attack on the table recomputation is:
Proof
Combine Theorem 2 and Eq. (30) of Appendix A.2. \(\square \)
Remark 5
The \({\text {ROPT}}_{2}\) which targets the second order moment happens not to take into account the terms of the recomputation stage. Naturally the only second order leakages are also the ones used by \({\text {2O-CPA}}\) and \({\text {OPT}}_{{\text {2O}}}\) distinguishers.
Proposition 4
The third degree rounded optimal attack on the table recomputation is:
where the values of \( \mu _1^{\left( q \right) }\), \(\mu _2^{\left( q \right) }\) and, \( \mu _3^{\left( q \right) }\) are respectively provided in Eq. (22) of Appendix A.1, Eq. (30) of Appendix A.2 and Eq. (33) of Appendix A.3.
Proof
Combining Theorem 2 and Appendix A. \(\square \)
Proposition 5
To compute \(\mu _1\), \(\mu _2\) and \(\mu _3\) an attacker does not need to compute the expectation over \(S_{2^n}\).
Proof
Proof given in Appendix A. \(\square \)
5 Complexity
In this section we give the time complexity needed to compute \({\text {OPT}}\) and \({\text {ROPT}}_{L}\). We also show that when \(L \ll D\) the complexity of \({\text {ROPT}}_{L}\) remains manageable whereas the complexity of \({\text {OPT}}\) is prohibitive. In this section all the complexities are computed for one key guess.
5.1 Complexity in the General Case
Let us first introduce an intermediate lemma.
Lemma 1
The complexity of computing \(\mu _{\ell }\) (for one trace) is lower than:
Proof
See Appendix B.1. \(\square \)
Proposition 6
The complexity of \({\text {OPT}}\) is:
The complexity of \({\text {ROPT}}_{L}\) is lower than:
Proof
The proof is given in Appendix B.2. \(\square \)
Proposition 6 allows to compare the complexity of the two attacks. One can notice that there are still terms with \(\varPi !\) or D! in \({\text {ROPT}}_{L}\) such as \({ D + L-1\atopwithdelims ()L }\) or \({\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , L \right) }\). Nevertheless these two terms can be seen as constants where \(L \ll D\). As a consequence we have the following remark.
Important Remark. When the degree L of the attack \({\text {ROPT}}_{L}\) is such that \(L \ll D\) the complexity of \({\text {OPT}}\) is much higher than the complexity of \({\text {ROPT}}_{L}\). Indeed the main term for \({\text {OPT}}\) is \(\varPi !\) whereas the one for \({\text {ROPT}}_{L}\) is \(2^{\left( \varOmega -1\right) n}\).
Proposition 7
The complexity of \({\text {ROPT}}_{L}\) can be reduced to \(\mathcal {O} \left( \! Q \!\cdot \! L \! \cdot \! { D + L -1\atopwithdelims ()L } \! \right) \) with a precomputation in \(\mathcal {O} \left( L \cdot { D + L-1\atopwithdelims ()L } \cdot 2^{\left( \varOmega -1\right) n} \cdot {\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , L \right) } \right) \).
Proof
See Appendix B.3. \(\square \)
This means that for Q large enough i.e. when \(\gamma \) is low enough this computational “trick” allows a speed-up factor of \(2^{\left( \varOmega -1\right) n} {\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , L \right) }\). The idea is to output the values depending on the queries from the computation of the expectations. These expectations only depend on the model which can be computed only once.
5.2 Complexity of Our Case Study
Let us now compute the complexity of these two distinguishers applied to our case study. Of course an approach could be to use the formula of the previous Sect. 5.1. But one can notice that a lot of terms could be independent of the key and as consequence not needed in an attack. Another approach is to use the formula of the distinguisher.
Proposition 8
The complexity of \({\text {OPT}}\) is:
The complexity of \({\text {ROPT}}_{2}\) is:
The complexity of \({\text {ROPT}}_{3}\) is lower than:
Proof
See Appendix B.4. \(\square \)
Remark 6
As already mentioned an attacker can ignore the leakages of the table recomputation and only target the two shares. In such case the complexity of \({\text {OPT}}_{{\text {2O}}}\) (Definition 4) is \(\mathcal {O}\left( Q \cdot (2^n) \right) \). With the result of Proposition 7 the complexity of \({\text {ROPT}}_{2}\) reduces to \(\mathcal {O}\left( Q \right) \).
Remark 7
Using the result of Proposition 7 the complexity of \({\text {ROPT}}_{3}\) can be reduced to \(\mathcal {O}\left( Q \cdot 2^{2n} \right) \) with a precomputation step of \(\mathcal {O}\left( 2^{2n} \right) \).
Remark 8
A summary of the complexity, and the computation time of the distinguishers are provided in Appendix B.5 in Table 1.
6 Simulation Results
In this section we validate in simulation the soundness of our approach for the case study described in Sect. 4.1. The results of these simulations are expressed in success rate (defined in [32] and denoted by SR). All simulations are computed using the Hamming weight model as a leakage model. As we assume an attacker with a perfect knowledge, the leakages are the model (denoted by y) plus some noise. The noise is Gaussian with a standard deviation of \(\sigma \).
In Subsect. 6.1 we assume that the attacker does not take into account the table recomputation stage. He only targets the leakages of the mask and the masked share (the leakage of masked S-Box). Namely the leakages which occurs in lines 1 and 10 of Algorithm 1. This approach allows to compute the restricted version of the maximum likelihood. We compare the results of the maximum likelihood, our rounded version and the high order attacks.
In Subsect. 6.2 we present our main results. In this subsection the attacker can exploit the leakage of the mask, the masked share and all the leakages of the table recomputation. In this scenario we show that our rounded version of the optimal distinguisher outperforms all the attacks of the state-of-the-art.
6.1 Exploiting only Leakage of the Mask and the Masked Share
In this subsection all the attacks are computed using only the leakages of the line 1 and the line 10 of Algorithm 1.
In this case study we assume a perfect masking scheme with: \(Y_0=w_H(M)\) and \(Y_1=w_H(S[T\oplus k] \oplus M)\).
It can be seen in Fig. 2 that even for small noise (\(\sigma = 1\), Fig. 2a) the \({\text {2O-CPA}}\) and \({\text {ROPT}}_{2}\) are equivalent. Indeed the two curves superimpose almost perfectly (in order to better highlight a difference, as many as 1000 attacks have been carried out for the estimation of the success rate). Moreover these two attacks are nearly equivalent to the optimal distinguisher (we recover here the results of [6]). We can notice that for both \(\sigma = 1\) and \(\sigma = 2\), \({\text {ROPT}}_{4}\) is not as good as \({\text {ROPT}}_{2}\). This means that the noise standard deviation is not large enough for approximations of higher degrees to be accurate. Indeed when the noise is not low enough the weight of each term of the decomposition can be such that some useful terms vanish due to the alternation of positive and negative terms in the Taylor expansion.
Let us recall that the decomposition of Eq. (8) is valid only for low \(\gamma =1/(2\sigma ^2)\) i.e. high noise. The error term ( \(o(\gamma ^L)\)) in the Taylor expansion gives the asymptotic evolution of this error when the noise increases but does not provide information about the error for a fixed value of noise variance. This means that the noise is too small for \({\text {ROPT}}_{4}\) to be a good approximation of \({\text {OPT}}\) although \({\text {ROPT}}_{2}\) is nearly equivalent to \({\text {OPT}}\).
For \(\sigma =2\) the noise is high enough to have a good approximation of \({\text {OPT}}\) by \({\text {ROPT}}_{4}\). For this noise all the attacks are close to \({\text {OPT}}\) (Fig. 2b).
In the context where only the mask and the masked share are used it is equivalent to compute the \({\text {2O-CPA}}\), \({\text {ROPT}}_{2}\) and \({\text {OPT}}\). As a consequence in the rest of this article only the \({\text {2O-CPA}}\) will be displayed.
To conclude our \({\text {ROPT}}_{L}\) is in this scenario at least as good as the HO-CPA of order L, which validates the optimality of state-of-the-art attacks against perfect masking schemes of order \(O=L\).
6.2 Exploiting the Shuffled Table Recomputation
In this subsection the attacker can target the leakage of the mask, the masked share and all the leakages occurring during the table recomputation. As a consequence the attacks of Subsect. 6.1 remain possible. It has been shown in [6, 33] that the \({\text {2O-CPA}}\) with the centered product becomes close to the \({\text {OPT}}_{{\text {2O}}}\) (the Maximum Likelihood) when the noise becomes high. It is moreover confirmed by our simulation results as it can be seen in Fig. 2. We choose as attack reference for the Fig. 3 the \({\text {2O-CPA}}\) and not the \({\text {OPT}}_{{\text {2O}}}\) because it performs similarly Fig. 2 and it is much faster to compute (see Table 1) which is mandatory for attacks with high noise (e.g. for \(\sigma =12\)) which involve many traces.
Following the formulas provided previously empirical validations have been done. For \(\sigma \le 8\) the attacks have been redone 1000 times to compute the SR. For \(\sigma > 8\) the attacks have been done 250 times. Results are plotted in Fig. 3. In these figures the results of the \({\text {2O-CPA}}\), the \({\text {MVA}}_{TR}\) and \({\text {ROPT}}_{3}\) are plotted. Noticed that the likelihood is not represented because we cannot average over R.
Recall that the cardinality of the support of R is \(2^n \times 2^n!\). It can be first noticed that for all the noises \({\text {ROPT}}_{3}\) is the best attack.
Let us analyze how much better \({\text {ROPT}}_{3}\) is than \({\text {2O-CPA}}\) and \({\text {MVA}}_{TR}\). The comparison with our new attack can be divided in three different categories. For low noise \(\sigma = 3 \) (see Fig. 3b) the results of \({\text {ROPT}}_{3}\) are similar to the results of \({\text {MVA}}_{TR}\). This means that the leakage of the shuffled table recomputation is the most leaking term in this case. At the opposite when the noise is high (for \(\sigma =12 \) see Fig. 3g) \({\text {ROPT}}_{3}\) becomes close to \({\text {2O-CPA}}\) which means that as expected the most informative part is the second order term. For medium noise \(7 \le \sigma \le 9 \) (see Fig. 3d, e and f) the results of \({\text {ROPT}}_{3}\) are much better than the result of \({\text {2O-CPA}}\) and \({\text {MVA}}_{TR}\). Moreover, the gain compared to the second best attack is maximum when the results of \({\text {2O-CPA}}\) and \({\text {MVA}}_{TR}\) are the same. Indeed for \(\sigma = 7\) (see Fig. 3d), \({\text {ROPT}}_{3}\) needs 35000 traces to reach 80 % of success whereas \({\text {MVA}}_{TR}\) (the second best attack) needs 60000 traces. This represents a gain of 71 %. For \(\sigma = 8\) (see Fig. 3e), \({\text {ROPT}}_{3}\) needs 65000 traces to reach 80 % of success whereas the \({\text {MVA}}_{TR}\) and the \({\text {2O-CPA}}\) needs 120000 traces. This represents a gain of 85 %. And when the noise increases to \(\sigma = 9 \) (see Fig. 3f), \({\text {ROPT}}_{3}\) needs 120000 traces to reach 80 % of success whereas \({\text {2O-CPA}}\) (the second best attack) needs 200000 traces, which is a gain of 66 %.
These results can be interpreted as follows: The \({\text {MVA}}_{TR}\) is a third order attack which depends on the third order moment. The \({\text {2O-CPA}}\) is a second order attack which depends on the second order moment. The new \({\text {ROPT}}_{3}\) attack combines these two moments. When the noise is low the \({\text {MVA}}_{TR}\) and the \({\text {ROPT}}_{3}\) performs similarly; this shows that the dominant term in the Taylor expansion is the third order one. At the opposite when the noise increases the \({\text {ROPT}}_{3}\) becomes close to the \({\text {2O-CPA}}\) which indicates that the important term in the Taylor expansion is the second order one. As \({\text {ROPT}}_{3}\) combines the second and the third order moment weighted by the SNR it is always better than any attack exploiting only one moment.
7 Conclusions and Perspectives
In this article, we derived new attacks based on the Lth degree Taylor expansion in the SNR of the optimal Maximum Likelihood distinguisher. We have shown that this Lth degree truncation allows to target a moment of order L. The new attack outperforms the optimal distinguisher with respect to time complexity. In fact as we have theoretically shown, the Taylor approximation can be effectively computed whereas the fully optimal maximum likelihood distinguisher, was not computationally tractable.
We have illustrated this property by applying our new method in a complex scenario of “shuffled table recomputation” and have compared the time complexity of the new attack and the optimal distinguisher. In addition, we have shown that in this context our attack has a higher success rate than all the attacks of the state-of-art over all possible noise variances.
An open question is how to quantify the accuracy of the approximation \(\mathrm {LL} \longrightarrow \mathrm {LL}_\ell \) as a function of the noise. In other words, what is the optimal degree of the Taylor expansion of the likelihood for a given SNR? Another interesting extension of this framework would be on hardware devices which are known to leak at various orders (see the real-world examples in [21–23]).
Notes
- 1.
We recall that, even if the templates are perfectly known, the online attack phase still requires \(\mathcal {O}(Q \cdot 2^{n(\varOmega -1)} \cdot \varPi !)\) computations.
References
Akkar, M.-L., Giraud, C.: An implementation of DES and AES, secure against some attacks. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 309–318. Springer, Heidelberg (2001). doi:10.1007/3-540-44709-1_26
Batina, L., Gierlichs, B., Prouff, E., Rivain, M., Standaert, F.X., Veyrat-Charvillon, N.: Mutual information analysis: a comprehensive study. J. Cryptol. 24(2), 269–291 (2011)
Blömer, J., Guajardo, J., Krummel, V.: Provably secure masking of AES. In: Handschuh, H., Hasan, M.A. (eds.) SAC 2004. LNCS, vol. 3357, pp. 69–83. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30564-4_5
Bruneau, N., Danger, J.-L., Guilley, S., Heuser, A., Teglia, Y.: Boosting higher-order correlation attacks by dimensionality reduction. In: Chakraborty, R.S., Matyas, V., Schaumont, P. (eds.) SPACE 2014. LNCS, vol. 8804, pp. 183–200. Springer, Heidelberg (2014). doi:10.1007/978-3-319-12060-7_13
Bruneau, N., Guilley, S., Heuser, A., Marion, D., Rioul, O.: Less is more dimensionality reduction from a theoretical perspective. In: Handschuh and Güneysu [13]
Bruneau, N., Guilley, S., Heuser, A., Rioul, O.: Masks will fall off. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol. 8874, pp. 344–365. Springer, Heidelberg (2014). doi:10.1007/978-3-662-45608-8_19
Bruneau, N., Guilley, S., Najm, Z., Teglia, Y.: Multivariate high-order attacks of shuffled tables recomputation. In: Handschuh and Güneysu [13]
Chari, S., Jutla, C.S., Rao, J.R., Rohatgi, P.: Towards sound approaches to counteract power-analysis attacks. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 398–412. Springer, Heidelberg (1999). doi:10.1007/3-540-48405-1_26
Clavier, C., Coron, J.-S., Dabbous, N.: Differential power analysis in the presence of hardware countermeasures. In: Koç, Ç.K., Paar, C. (eds.) CHES 2000. LNCS, vol. 1965, pp. 252–263. Springer, Heidelberg (2000). doi:10.1007/3-540-44499-8_20
Coron, J.-S.: Higher order masking of look-up tables. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. LNCS, vol. 8441, pp. 441–458. Springer, Heidelberg (2014). doi:10.1007/978-3-642-55220-5_25
Ding, A.A., Zhang, L., Fei, Y., Luo, P.: A statistical model for higher order DPA on masked devices. In: Batina, L., Robshaw, M. (eds.) CHES 2014. LNCS, vol. 8731, pp. 147–169. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44709-3_9
Duc, A., Faust, S., Standaert, F.-X.: Making masking security proofs concrete. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 401–429. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46800-5_16
Güneysu, T., Handschuh, H. (eds.): CHES 2015. LNCS, vol. 9293. Springer, Heidelberg (2015)
Herbst, C., Oswald, E., Mangard, S.: An AES smart card implementation resistant to power analysis attacks. In: Zhou, J., Yung, M., Bao, F. (eds.) ACNS 2006. LNCS, vol. 3989, pp. 239–252. Springer, Heidelberg (2006). doi:10.1007/11767480_16
Ishai, Y., Sahai, A., Wagner, D.: Private circuits: securing hardware against probing attacks. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 463–481. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45146-4_27
Lemke-Rust, K., Paar, C.: Analyzing side channel leakage of masked implementations with stochastic methods. In: Biskup, J., López, J. (eds.) ESORICS 2007. LNCS, vol. 4734, pp. 454–468. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74835-9_30
Lemke-Rust, K., Paar, C.: Gaussian mixture models for higher-order side channel analysis. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 14–27. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74735-2_2
Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks - Revealing the Secrets of Smart Cards. Springer, Heidelberg (2007)
Messerges, T.S.: Securing the AES finalists against power analysis attacks. In: Goos, G., Hartmanis, J., Leeuwen, J., Schneier, B. (eds.) FSE 2000. LNCS, vol. 1978, pp. 150–164. Springer, Heidelberg (2001). doi:10.1007/3-540-44706-7_11
Messerges, T.S.: Using second-order power analysis to attack DPA resistant software. In: Koç, Ç.K., Paar, C. (eds.) CHES 2000. LNCS, vol. 1965, pp. 238–251. Springer, Heidelberg (2000). doi:10.1007/3-540-44499-8_19
Moradi, A.: Statistical tools flavor side-channel collision attacks. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp. 428–445. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29011-4_26
Moradi, A., Standaert, F.X.: Moments-correlating DPA. IACR Cryptology ePrint Archive 2014, p. 409, 2 June 2014
Moradi, A., Wild, A.: Assessment of hiding the higher-order leakages in hardware. In: Güneysu, T., Handschuh, H. (eds.) CHES 2015. LNCS, vol. 9293, pp. 453–474. Springer, Heidelberg (2015). doi:10.1007/978-3-662-48324-4_23
Nikova, S., Rijmen, V., Schläffer, M.: Secure hardware implementation of nonlinear functions in the presence of glitches. J. Cryptol. 24(2), 292–321 (2011)
Oswald, E., Mangard, S.: Template attacks on masking—resistance is futile. In: Abe, M. (ed.) CT-RSA 2007. LNCS, vol. 4377, pp. 243–256. Springer, Heidelberg (2006). doi:10.1007/11967668_16
Pan, J., Hartog, J.I., Lu, J.: You cannot hide behind the mask: power analysis on a provably secure S-Box implementation. In: Youm, H.Y., Yung, M. (eds.) WISA 2009. LNCS, vol. 5932, pp. 178–192. Springer, Heidelberg (2009). doi:10.1007/978-3-642-10838-9_14
Peeters, E., Standaert, F.-X., Donckers, N., Quisquater, J.-J.: Improved higher-order side-channel attacks with FPGA experiments. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 309–323. Springer, Heidelberg (2005). doi:10.1007/11545262_23
Prouff, E., Rivain, M.: A generic method for secure SBox implementation. In: Kim, S., Yung, M., Lee, H.-W. (eds.) WISA 2007. LNCS, vol. 4867, pp. 227–244. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77535-5_17
Prouff, E., Rivain, M., Bevan, R.: Statistical analysis of second order differential power analysis. IEEE Trans. Comput. 58(6), 799–811 (2009)
Rivain, M., Prouff, E.: Provably secure higher-order masking of AES. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 413–427. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15031-9_28
Rivain, M., Prouff, E., Doget, J.: Higher-order masking and shuffling for software implementations of block ciphers. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 171–188. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04138-9_13
Standaert, F.-X., Malkin, T.G., Yung, M.: A unified framework for the analysis of side-channel key recovery attacks. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 443–461. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01001-9_26
Standaert, F.-X., Veyrat-Charvillon, N., Oswald, E., Gierlichs, B., Medwed, M., Kasper, M., Mangard, S.: The world is not enough: another look on second-order DPA. In: Abe, M. (ed.) ASIACRYPT 2010. LNCS, vol. 6477, pp. 112–129. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17373-8_7
Stuart, A., Ord, K.: Kendall’s Advanced Theory of Statistics: Distribution Theory, 6th edn. Wiley-Blackwell, New York (1994). ISBN-10: 0470665300; ISBN-13: 978-0470665305
TELECOM ParisTech SEN research group. DPA Contest, 4th edn., 2013–2014. http://www.DPAcontest.org/v4/
Tunstall, M., Whitnall, C., Oswald, E.: Masking tables—an underestimated security risk. In: Moriai, S. (ed.) FSE 2013. LNCS, vol. 8424, pp. 425–444. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43933-3_22
Veyrat-Charvillon, N., Medwed, M., Kerckhof, S., Standaert, F.-X.: Shuffling against side-channel attacks: a comprehensive study with cautionary note. In: Wang, X., Sako, K. (eds.) ASIACRYPT 2012. LNCS, vol. 7658, pp. 740–757. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34961-4_44
Waddle, J., Wagner, D.: Towards efficient second-order power analysis. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 1–15. Springer, Heidelberg (2004). doi:10.1007/978-3-540-28632-5_1
Weisstein, E.W.: Cumulant. From MathWorld A Wolfram Web Resource. http://mathworld.wolfram.com/Cumulant.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Computation of the Moments
1.1 A.1 Computation of \(\mu _1\)
There is no computational difficulty:
Now, when there is no \(\varphi \) in the R.V., then the expectation is only on M (indeed, \(\frac{1}{2^n!} \sum _{\varphi \in S_{2^n}} 1=1\)). Thus,
which cannot further be simplified (in the simulations, it will be computed by the computer).
Similarly
When there is an expectation on \(\varPhi \), then at order one, it considers only one value \(\varPhi (\omega )\). It is uniformly distributed, hence one can replace the expectation on \(\varPhi \) by an expectation on one value of \(\varphi \), we call \(M'\). For instance:
which can thus be computed with the same average method as \(\mathbb {E}(f_0)\).
Lastly, when there is both M and \(\varPhi (\omega )\), then whichever variable can absorb the other one, since both are uniformly distributed on \(\mathbb F_2^n\). This means that:
which is once again a similar computation as done for computing \(\mathbb {E}(f_0)\).
1.2 A.2 Computation of \(\mu _2\)
Recall that only the key dependent terms of \(\mu _2\) are needed for \({\text {ROPT}}_{2}\) and \({\text {ROPT}}_{3}\).
Notice that the square terms are computed as the non-square terms. For instance,
which we drop since it does not depend on k. All in one, the only key-dependent term is:
which cannot be further simplified and will be computed by the computer. So, for the purpose of the attack, we have:
1.3 A.3 Computation of \(\mu _3\)
We shall consider only terms which depend on the key, hence product of three terms, one of which (at least) is \(f_0\). Obviously, \(\mathbb {E}(f^3_0)\) does not depend on k, for the same reason as given in Eq. (28). But the two terms:
-
1.
\(\mathbb {E}(f^2_0 f_1)\) and
-
2.
\(\mathbb {E}(f_0 f^2_1)\)
Notice that they are present \({3 \atopwithdelims ()2}=3\) times each when developing the cube.
Interestingly, those are not the only cases where \(f_0\) and \(f_1\) are selected.
Similarly, we have:
Now, we consider products without \(f_1\). Obviously, taking only \(f_0\) and \(f_i\) is not enough, since: \(\mathbb {E}(f^2_0 f_i)=\mathbb {E}(f^2_0) \mathbb {E}(f_i)\) and \(\mathbb {E}(f_0 f^2_i)=\mathbb {E}(f_0) \mathbb {E}(f^2_i)\) are key independent. The same goes for \(\mathbb {E}(f^2_0 f_j)\) and \(\mathbb {E}(f_0 f^2_j)\). We are left with \(\mathbb {E}(f_0 f_i f_{i'})\), \(\mathbb {E}(f_0 f_j f_{j'})\), and \(\mathbb {E}(f_0 f_i f_j)\).
The term \(\mathbb {E}(f_0 f_i f_{i'}) = \mathbb {E}(f_0) \mathbb {E}(f_i f_{i'}))\) does not depend on k, because there is no M in \(f_i\).
The term \(\mathbb {E}(f_0 f_j f_{j'})\) can also factorize as \(\mathbb {E}(f_0) \mathbb {E}(f_j f_{j'}))\), hence it does not depend on k. The reason is more subtle, so we detail it:
Now, the second sum does not depend on m, as shown below:
Consequently, the last case is \(\mathbb {E}(f_0 f_i f_j)\). We can subdivide it into two cases: \(j=i+2^n\) and \(j\ne i+2^n\). When \(j=i+2^n\), the permutation \(\varPhi \) is evaluated at the same \(\omega \) in \(f_i\) and \(f_j\). We denote by \(M'\) the R.V. \(\varPhi (\omega )\), where \(\omega =j-2\). Hence:
These terms (for all \(j\in J\)) correspond to the \({\text {MVA}}_{TR}\) attack published at CHES 2015 [7].
Eventually, there are the terms for \(j\ne i-2^n\). They are actually key dependent, hence must be kept. They are equal to:
Interestingly, without the constraint \(m'\ne m''\), this quantity does not depend on the key. So, the leakage which is exploited here is due to the fact \(\varPhi \) is not a random function, but a bijection. As, in \(\mu _3\), we are only interested in non constant terms, we can rewrite:
The non-constant term is similar to Eq. (31) provided a scaling by \(-(2^n-1)/2^n\) is done.
So, for the purpose of the attack, we have:
B Complexity Proofs
1.1 B.1 Proof of Lemma 1
In order to prove Lemma 1 let us first introduce a preliminary result.
Lemma 2
The quantity \({\varPi \atopwithdelims ()\ell }\) is increasing if \(\ell < \left\lceil {\varPi /2} \right\rceil \) and its maximum is \({\varPi \atopwithdelims () \left\lceil { \frac{\varPi }{2}} \right\rceil }\).
Proof
and the factor \(\frac{\varPi -\ell -1}{\ell +1}\) is strictly greater than 1. Indeed,
\(\square \)
Finally we can prove Lemma 1.
Proof
Let us first assume that one dimension leaks at most one element of the permutation. We can thus develop the expression of \(\mu _\ell \), and we denote the complexity under the braces.
As \(k_1 + \ldots +k_D=\ell \) there are at most D indices \(k_d, 1\le d\le D\) such that \(k_d \ne 0\). Hence there are at most \(\min \left( D, \ell \right) \) elements in the product.
Each dimensions which leaks an element of the permutation can also leaks the masks. The worst case in terms of complexity is when all the permutation leakages depend also on the masks. Let us denote by i such that \( 1 \le i \le \min \left( D, \ell \right) \) the number of those terms. Then the expectation is computed over \(2^{\left( \varOmega -1\right) n}\frac{\varPi !}{\left( \varPi -i \right) !}\). Nevertheless by taking into account the commutativity properties of the product one can only compute \({2^{\left( \varOmega -1\right) n} {\varPi \atopwithdelims ()i }}\).
By Lemma 2 we have that is value \( {\varPi \atopwithdelims ()i }\) is maximum with \({\varPi \atopwithdelims ()\ell }\) when \(\ell \le \left\lceil { \frac{\varPi }{2}} \right\rceil \). When \(\ell > \frac{\varPi }{2} +1 \) the maximum is \({\varPi \atopwithdelims () \left\lceil { \frac{\varPi }{2}} \right\rceil }\).
Finally as there are \({ D + \ell -1\atopwithdelims ()\ell }\) elements in the sum.
The complexity of \(\mu _\ell \) is lower than \(\mathcal {O} \left( { D + \ell -1\atopwithdelims ()\ell } 2^{\left( \varOmega -1\right) n} {\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , \ell \right) } \right) \). \(\square \)
1.2 B.2 Proof of Proposition 6
In order to prove Lemma 6 let us first introduce a preliminary result.
Lemma 3
The quantity \({D-1+\ell \atopwithdelims ()\ell }\) is increasing with \(\ell \) if \(D>1\).
Proof
We have that:
where \(\forall \ell \), \(\frac{D+\ell }{\ell +1}>1\) provided \(D>1\). \(\square \)
Finally let us prove Prop. 6.
Proof
\({\underline{Complexity \, of\, {\text {OPT}}}\!:}\)
Following Eq. (2) we have that the computation for a key guess of \({\text {OPT}}\) is:
We assume that the computation of the \(\log \) and the \(\exp \) is constant. As a consequence the complexity of the optimal distinguisher is \(\mathcal {O}\left( Q\cdot (2^n)^{\varOmega -1}\cdot \varPi ! \cdot D\right) \)
\({\underline{Complexity\, of\, {\text {ROPT}}_{L}}\!:}\) The computation of \({\text {ROPT}}_{L}\) involves the computation of the \(\mu _{\ell }\) with \(\ell \le L\) (Eqs. () and ()). By Lemmas 1 and 3 all these terms have a complexity lower than \( \mathcal {O} \left( { D + L -1\atopwithdelims ()L } \cdot 2^{\left( \varOmega -1\right) n} \cdot {\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , L \right) } \right) \) (Eq. (16)).
As a consequence the complexity of \({\text {ROPT}}_{L}\) is lower than
\(\square \)
1.3 B.3 Proof of Proposition 7
Proof
Let us develop all the product in the term \(\mu _{\ell }\) in order to compute the expectation in the minimum number of values.
Moreover \((x_{d} - y_{d}(t,k,M))^{2\ell _d} = \sum _{i=0}^{2\ell _d} {2\ell _d\atopwithdelims ()i} x_{d}^{2\ell _d-i} y_{d}(t,k,M)^i\)
\(\square \)
1.4 B.4 Proof of Proposition 8
Proof
In our case study the size of the permutation is \(\varPi = 2^n\).
Then the complexity of \({\text {OPT}}\) is given by a straightforward application of Eq. (17).
From Eq. (14) we have that for \({\text {ROPT}}_{2}\) the computation for one key guess and one trace is given by \( \mathbb {E}(f_0 \times f_1)\). In this equation the expectation is computed over \(2^n\) values (Eq. (28)).
From Eq. (15) we have that for \({\text {ROPT}}_{3}\) the computation for one key guess and one trace is given by \( \mu _{2}^{\left( q \right) } (1+\gamma \mu _1^{\left( q \right) }) - \gamma \frac{\mu _{3}^{\left( q \right) }}{3}\). It can be seen in Eqs. (23), (24), (25) and (27) that the expectation of \(\mu _1\) is computed over \(2^n\) values. The dominant term in \(\mu _3\) (Eq. (33)) is :
The expectation in this term is computed over \(2^{2n}\) values (Eq. (32)). The sum is computed on less than \(2^{2n}\). \(\square \)
1.5 B.5 Time and Complexity
The times of the section are expressed in seconds. All the attacks have been run on Intel Xeon X5660 running at 2.67 GHz. All the implementations are mono-thread. The model of the simulations is the one describe in Sect. 6. For each distinguisher the attacks are computed 1000 times on 1000 traces.
C Analysis of the DPAcontest
Recently an open implementation of a masking scheme with shuffling has been presented in the DPA contest v4.2 [35]. In this implementation the execution of the different states is performed in an random order.
An attacker can target the integrated leakages of the different states in order to counter the shuffling [9, 31].
A better approach is to take into account the possible leakages of the permutation. In this case the optimal distinguisher will be not computable as it involves an expectation over 16! values. In this case the rounded optimal attack will reduced this complexity.
Let us defined the leakages of such implementations.
-
\(X_0=y_0\left( t,k,R\right) +N_0 \) with \(y_0\left( t,k,R\right) = w_H(M)\),
-
\(X_1=y_1\left( t,k,R\right) +N_1\) with \(y_1\left( t,k,R\right) = w_H(S[ \pi \left( T\oplus k \right) ]\oplus M)\),
-
\(X_i=y_i\left( t,k,R\right) +N_i\), for \(i=2,\ldots ,18\) with \(y_i\left( t,k,R\right) = w_H(\varPhi (i-2))\),
Then similarly to the Appendix A we have that:
Additionally as it is a low entropy masking scheme the secret key can leaked in an univariate high order attack. Depending on the number of masks involve in the masking scheme it could be at order 2, 3 or more. For simplicity let us assume it is at order 3. In such cases
Of course an attacker can additionally exploit all the leakages of the different states in order to increase the success of the attacks.
In some particular low entropy masking schemes the same masks are reused several time or are linked by deterministic relations (e.g. the first version of the DPAcontest). In this context it could be interesting to combine the leakages of different states [4]. In this case our method could benefit of the multiple possible points combinations.
Rights and permissions
Copyright information
© 2016 International Association for Cryptologic Research
About this paper
Cite this paper
Bruneau, N., Guilley, S., Heuser, A., Rioul, O., Standaert, FX., Teglia, Y. (2016). Taylor Expansion of Maximum Likelihood Attacks for Masked and Shuffled Implementations. In: Cheon, J., Takagi, T. (eds) Advances in Cryptology – ASIACRYPT 2016. ASIACRYPT 2016. Lecture Notes in Computer Science(), vol 10031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53887-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-662-53887-6_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53886-9
Online ISBN: 978-3-662-53887-6
eBook Packages: Computer ScienceComputer Science (R0)