Trade-Offs for S-Boxes: Cryptographic Properties and Side-Channel Resilience

Carlet, Claude; Heuser, Annelie; Picek, Stjepan

doi:10.1007/978-3-319-61204-1_20

Claude Carlet¹⁶,
Annelie Heuser¹⁷ &
Stjepan Picek^16,18,19

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10355))

Included in the following conference series:

International Conference on Applied Cryptography and Network Security

2809 Accesses

Abstract

When discussing how to improve side-channel resilience of a cipher, an obvious direction is to use various masking or hiding countermeasures. However, such schemes come with a cost, e.g. an increase in the area and/or reduction of the speed. When considering lightweight cryptography and various constrained environments, the situation becomes even more difficult due to numerous implementation restrictions. However, some options are possible like using S-boxes that are easier to mask or (more on a fundamental level), using S-boxes that possess higher inherent side-channel resilience. In this paper we investigate what properties should an S-box possess in order to be more resilient against side-channel attacks. Moreover, we find certain connections between those properties and cryptographic properties like nonlinearity and differential uniformity. Finally, to strengthen our theoretical findings, we give an extensive experimental validation of our results.

You have full access to this open access chapter, Download conference paper PDF

On the Construction of Side-Channel Attack Resilient S-boxes

Strong 8-bit Sboxes with Efficient Masking in Hardware

Strong 8-bit Sboxes with efficient masking in hardware extended version

Article 15 March 2017

1 Introduction

When designing a block cipher, one needs to consider many possible cryptanalysis attacks and often give the best trade-off between the security, speed, ease of implementation, etc. Besides the two main directions in the form of linear [1] and differential [2] cryptanalyses, today the most prominent attacks come from the implementation attacks group where side-channel attacks (SCAs) play an important role. To protect against SCA, one common option is to use various countermeasures such as hiding or masking schemes [3] where one well known example is the threshold implementation [4]. However, such countermeasures come with a cost when implementing ciphers. If considering more resource constrained environments, one often does not have enough resources to implement standard ciphers like AES and therefore one needs to use lightweight cryptography. However, even lightweight ciphers can be too resource demanding especially when the cost of countermeasures is added. Therefore, although countermeasures represent the way how to go when considering SCA protection, there is no countermeasure (at least at the current state of the research) that offers sufficient protection against any attack while being cheap enough to be implemented in any environment.

In this paper, we consider how to improve SCA resilience of ciphers without imposing any extra cost. This is possible by considering the inherent resilience of ciphers. We particularly concentrate on block ciphers which utilize S-boxes and therefore study the resilience of S-boxes against side-channel attacks.

In the case of SCA concentrating only on 1-bit of the S-box output, a theoretical connection between the side-channel resistance and differential uniformity of S-boxes has been found in [5]. In particular, the authors showed that the higher the side-channel resistance, the smaller the differential resistance. However, as we show, this extension does not straightforwardly hold when considering more complex leakage models as the Hamming weight of the S-box output, which is the most prominent leakage model in side-channel analysis when considering Correlation Power Analysis (CPA) [6]. We therefore investigate S-box parameters which may influence the side-channel resistance while still having good or optimal cryptographic properties. The (almost) preservation of Hamming weight and a small Hamming distance between x and F(x) are two properties each of which could strengthen the resistance to SCA from an intuitive perspective. Our theoretical and empirical findings show that notably in the case when exactly preserving the Hamming weight, the SCA resilience is improved. Moreover, we relax this assumption and investigate in S-boxes that almost preserve the Hamming weight. For our study, we employ the confusion coefficient [7] as a metric for side-channel resistance. Besides the signal-to-noise-ratio and the number of observed measurements, the confusion coefficient is the factor influencing the success rate of CPA and, moreover, it is the only factor that depends on the underlying considered algorithm and thus on the S-box. More precisely our main contributions are:

1.
We calculate (resp. we bound above) the confusion coefficient value of a function F in the two scenarios where:
1. (a)
  x and F(x) have the same Hamming weight.
2. (b)
  in average, F(x) has a Hamming weight near that of x.
2.
We observe that the S-boxes with no difference between the Hamming weights of their input and output have nonlinearity equal to 0; more generally, the same happens when the Hamming weight of x and the Hamming weight of F(x) have always the same parity. Such functions are of course to be avoided from a cryptanalysis perspective. Furthermore, we show that more generally as well, for every S-box F, denoting by $d_{w_H}$ the number of inputs x for which the Hamming weights of x and F(x) have different parities, F has nonlinearity at most $d_{w_H}$. This implies that if the number of inputs x such that $w_H(x)\ne w_H(F(x))$ is at most $d_{w_H}$, the nonlinearity is at most $d_{w_H}$. We show in Example 2 that this does not make however the S-box necessarily weak. We emphasize that although these observations could be regarded trivial, they have practical consequences.
3.
We show the connection between the number of fixed points in a function F and its nonlinearity.
4.
We show that S-boxes such that F(x) lies at a small Hamming distance from x (or more generally from an affine function of x) cannot have high nonlinearity although the obtainable values are not too bad for $n = 4, 8$.
5.
In the practical part, we confirm our theoretical findings about the connection between (almost) preserving the Hamming weight and the confusion coefficient by investigating several S-boxes.
6.
We investigate the relationship between the confusion coefficient of different key guesses and evaluate a number of S-boxes used in today’s ciphers to show that their SCA resilience can significantly differ.

2 Preliminaries

2.1 Generalities on S-Boxes

Let n, m be positive integers, i.e., $n, m \in \mathbb {N}^+$. We denote by $\mathbb {F}_{2}^{n}$ the n-dimensional vector space over $\mathbb {F}_{2}$ and by $\mathbb {F}_{2^n}$ the finite field with $2^n$ elements. The set of all n-tuples of elements in the field $\mathbb {F}_{2}$ is denoted by $\mathbb {F}_{2}^{n}$, where $\mathbb {F}_{2}$ is the Galois field with two elements. Further, for any set S, we denote $S \backslash \{0\}$ by $S^{*}$. The usual inner product of a and b equals $a\cdot b = \bigoplus _{i=1}^{n} a_{i}b_{i}$ in $F_{2}^n$.

The Hamming weight $w_H(a)$ of a vector a, where $a \in \mathbb {F}_{2}^{n}$, is the number of non-zero positions in the vector. An (n, m)-function is any mapping F from $\mathbb {F}_{2}^{n}$ to $\mathbb {F}_{2}^{m}$. An (n, m)-function F can be defined as a vector $F = (f_1,\cdots ,f_m)$, where the Boolean functions $f_i: \mathbb {F}_2^n \rightarrow \mathbb {F}_2$ for $i \in \{1, \cdots , m\}$ are called the coordinate functions of F.

The component functions of an (n, m)-function F are all the linear combinations of the coordinate functions with non all-zero coefficients. Since for every n, there exists a field $\mathbb {F}_{2^n}$ of order $2^n$, we can endow the vector space $\mathbb {F}_2^n$ with the structure of that field when convenient. If the vector space $\mathbb {F}_2^n$ is identified with the field $\mathbb {F}_{2^n}$ then we can take $a\cdot b = tr (ab)$ where $tr(x) = x + x^2 + \ldots +x^{2^{n-1}}$ is the trace function from $\mathbb {F}_{2^n}$ to $\mathbb {F}_{2}$. The addition of elements of the finite field $\mathbb {F}_{2^n}$ is denoted with “+”, as usual in mathematics. Since, often, we identify $\mathbb {F}_{2}^n$ with $\mathbb {F}_{2^n}$ and if there is no ambiguity, we denote the addition of vectors of $\mathbb {F}_{2}^n$ when $n>1$ with “+” as well.

An (n, m)-function F is balanced if it takes every value of $\mathbb {F}_{2}^{m}$ the same number $2^{n - m}$ of times.

The Walsh-Hadamard transform of an (n, m)-function F is (see e.g. [8]):

$$\begin{aligned} W_{F} (a, v) = \sum _{x \in \mathbb {F}_{2}^{m}} (-1)^{v\cdot F(x) + a\cdot x}, \ a, v \in \mathbb {F}_{2}^{m}. \end{aligned}$$

(1)

The nonlinearity nl of an (n, m)-function F equals the minimum nonlinearity of all its component functions $v\cdot F$, where $v \in \mathbb {F}_{2}^{m*}$ [8, 9]:

(2)

The nonlinearity of any (n, m) function F is bounded above by the so-called covering radius bound:

$$\begin{aligned} nl \le 2^{n-1} - 2^{\frac{n}{2}-1}. \end{aligned}$$

(3)

In the case $m=n$, a better bound exists. The nonlinearity of any (n, n) function F is bounded above by the so-called Sidelnikov-Chabaud-Vaudenay bound [10]:

$$\begin{aligned} nl \le 2^{n-1} - 2^{\frac{n-1}{2}}. \end{aligned}$$

(4)

Bound (4) is an equality if and only if F is an Almost Bent (AB) function, by definition of AB functions [8].

Let F be a function from $\mathbb {F}_2^n$ into $\mathbb {F}_2^m$ with $a \in \mathbb {F}_2^n$ and $b \in \mathbb {F}_2^m$. We denote:

$$\begin{aligned} D_F (a, b) = \left\{ x \in \mathbb {F}_2^n : F(x)+F(x+a) =b\right\} . \end{aligned}$$

(5)

The entry at the position (a, b) corresponds to the cardinality of the delta difference table $D_F (a, b)$ and is denoted as $\delta (a, b)$. The differential uniformity $\delta _F$ is then defined as [11]:

$$\begin{aligned} \delta _F = \max _{\begin{array}{c} a \ne 0, b \end{array}} \delta (a, b). \end{aligned}$$

(6)

Functions that have differential uniformity equal to 2 are called the Almost Perfect Nonlinear (APN) functions. Every AB function is also APN, but the converse does not hold in general. AB functions exist only in an odd number of variables, while APN functions also exist for an even number of variables. When discussing the differential uniformity parameter for permutations, the best possible (and known) value is 2 for any odd n and also for $n = 6$. For n even and larger than 6, this is an open question. The differential uniformity value for the inverse function $F(x)=x^{2^n-2}$ equals 4 when n is even and 2 when n is odd.

2.2 Side-Channel Resistance

Side-channel attacks analyze physical leakage that is unintentionally emitted during cryptographic operations in a device (e.g., through the power consumption [12] or electromagnetic emanation [13]). This side-channel leakage is statistically dependent on the intermediate processed values involving the secret key, which makes it possible to retrieve the secret from the measured data. In particular, as the attacker wants to retrieve the secret key, he makes predictions (hypotheses) on a small enumerable chunk (e.g., byte) of an intermediate state using all possible key values.

The side-channel resistance of implementations against Correlation Power Attack (CPA) [6] depends on three factors: the number of measurement traces, the signal-to-noise ratio (SNR) [14], and the confusion coefficient [7]. The relationship between the three factors is linear in case of low SNR [15]. The confusion coefficient measures the discrepancy between the hypothesis of an intermediate state using the correct (secret) key and any hypothesis made with a (wrong) key assumption. Therefore, as one compares possible intermediate processed values, the confusion coefficient depends on the underlying cryptographic algorithm and thus, if the attacker targets an S-box operation, on the side-channel resistance of that S-box. More precisely, let us assume the attacker exploits an intermediate processed value $F(k_c + t)$ during the first round that depends on the secret key $k_c \in \mathbb F_2^n$, an n-bit chunk of the plaintext $t \in \mathbb {F}_2^n$, and an S-box function F. Moreover, let us make the commonly accepted assumption that the device is leaking side-channel information as the Hamming weight (see e.g., [14]) of intermediate values with additive noise N:

$$\begin{aligned} w_H(F(k_c + t)) + N. \end{aligned}$$

(7)

As the secret key $k_c$ is unknown to the attacker, he computes for each key guess $k_g \in \mathbb {F}_2^n$ a hypothesis about the intermediate state:

$$\begin{aligned} y_{k_g,t} = y(k_g,t) = w_H(F(k_g + t)) \end{aligned}$$

(8)

of the deterministic part of the leakage in Eq. (7). Interestingly, these hypotheses are not independent and their discrepancy is characterized by the confusion coefficient. Originally in [7] the confusion coefficient has been introduced for (n, 1) Boolean functions:

$$\begin{aligned} \kappa (k_c,k_g)= Pr[(y(k_c,T))\ne (y(k_g,T))], \end{aligned}$$

(9)

with T being the random variable whose realization is t. In [5], the authors related $\kappa (k_c,k_g)$ in Eq. (9) to $\delta _F$ and showed that the higher the side-channel resistance, the smaller the differential resistance (that is, the higher $\delta _F$). In fact, $\kappa (k_c,k_g)$ is represented as

$$\begin{aligned} \frac{1}{2^n} \sum _{t\in \mathbb F_2^n}\left( F(t + k_c) + F(t + k_g)\right) , \end{aligned}$$

(10)

which can then be straightforwardly connected to $\delta _F$ for 1-bit models.

In [16] the authors extend $\kappa (k_c,k_g)$ to the general multi-bit case for CPA and thus to (n, m)-functions F. In this paper, we use the definition given in [15] which is a standardized version of confusion coefficient given in [16] and thus a natural extension of Eq. (9):

$$\begin{aligned} \kappa (k_c,k_g) = \mathbb E\Bigl \{\Bigl (\frac{1}{2}({y(k_c,T)-y(k_g,T)})\Bigr )^2\Bigr \}, \end{aligned}$$

(11)

where y is assumed to be standardized (i.e., $\mathbb E(y(\cdot ,T))=0, Var(y(\cdot ,T))=1$). More specifically, Eq. (11) enables us to compare confusion coefficients for different functions F. By substituting $y(*)$ with Eq. (8) and denoting $x = t \oplus k_c$ and $a=k_c+k_g$ we can write $\kappa (k_c,k_g)$ as

(12)

Now, it is easy to see that from Eq. (12) we cannot straightforwardly derive a connection to $\delta _F$ for (n, m) functions. More precisely, for $m = 1$ the square is just 4 times the value of $F(t)+ F(t+a)$ and then the confusion coefficient equals $\delta (a, 1)$. For $m>1$ we have the square of the difference between the weights of F(t) and $F(t+ a)$ which is not 4 times the weight of $b=F(t)+ F(t + a)$ because the $1-0$ and the $0-1$ count with their signs in the sum. So there is no direct connection with $\delta _F$ anymore.

As a decisive criterion for comparison between confusion coefficients, the minimum value of $\kappa (k_c,k_g)$ was specified in [15] as it relates to the success rate when the SNR is low. Note that the higher is the minimum of the confusion coefficient, the lower is the side-channel resilience. This comes from the fact that the lower the confusion coefficient the smaller is the (Euclidean) distance between the correct key $k_c$ and a key guess $k_g$ and thus the harder it is for an attacker to distinguish if the leakage is arising due to a computation with $k_c$ or $k_g$. A detailed discussion on this will be given in Subsect. 5.2. On the other hand, in [17] authors use $var(\kappa (k_c,k_g))$ as a criterion, where smaller values indicate lower side-channel resilience. Our experiments in Sect. 5 show that both metrics coincide with the empirical resilience using simulations.

In the case $\kappa (k_c,k_g)=0$ or $\kappa (k_c,k_g)=1$ for any $k_g \ne k_c$, CPA is not able to distinguish between $k_c$ and this key guess $k_g$ and will thus fail to reveal the secret key exclusively even if the number of measurements goes to infinity. More precisely, $\kappa (k_c,k_g)=0$ means that for a key guess $k_g$ one observes exactly the same intermediate values (see Eq. (8)) as for the correct key $k_c$. Contrary for $\kappa (k_c,k_g)=1$ one observes the complementary value (can be seen from Eqs. (9) and (11)), however, as CPA takes the absolute value of correlation (due to hardware related properties [14]) an attacker again cannot distinguish between $k_c$ and $k_g$ in this case. In general, normalized confusion coefficient values close to 0.5 indicate that $k_c$ and $k_g$ can be easily distinguished (see Eq. (9)). We will show in Sect. 3 and empirically confirm in Sect. 5 that in case of preserving $w_H$ there exists an key guess $k_g$ such that $\kappa (k_c,k_g)=1$.

3 S-Boxes (Almost) Preserving the Hamming Weight

3.1 Relation to the Confusion Coefficient

To obtain, for an (n, m)-function F, a connection between the confusion coefficient parameter and the Hamming weight preservation (i.e., the fact that, for every x, F(x) has the same Hamming weight as x) or, more generally, a limited average Hamming weight modification, we start with Eq. (12). For any function F, we have:

(13)

Lemma 1 addresses the case where F preserves the Hamming weight, whereas the scenario in which F modifies the Hamming weight in a limited way is described in Lemma 2. Note that the first scenario is a particular case of the second.

Lemma 1

For an (n, n)-function such that, for every x, F(x) has the same Hamming weight as x, the confusion coefficient equals $\frac{w_H(a)}{n}$.

Proof

If F preserves the Hamming weight, that is, if $w_H(F(x))=w_H(x)$ for every x (or more generally, if F is the composition of a function preserving the weight by an affine isomorphism on the right), then the confusion coefficient $\kappa (k_c,k_g)=\mathbb {E}\left( \left( \frac{w_H(F(x))-w_H(F(x+a))}{\sqrt{n}}\right) ^2\right) ,$ where $a=k_c+k_g$, becomes $\mathbb {E}\left( \left( \frac{w_H(x)-w_H(x+a)}{\sqrt{n}}\right) ^2\right) $, and by applying Eq. (13) (which is valid for every F) to $F=Id$, we obtain:

$$\begin{aligned}&\frac{1}{2n} \mathbb {E} \left( \left( \sum _{i=1}^n(-1)^{x_i}\right) ^2- \left( \sum _{i=1}^n(-1)^{x_i}\right) \left( \sum _{i=1}^n (-1)^{x_i+a_i}\right) \right) \\ \nonumber =&\,\frac{1}{2n} \mathbb {E} \left( \sum _{1\le i,j\le m}(-1)^{x_i+x_j}- \sum _{1\le i,j\le m}(-1)^{x_i+x_j+a_j}\right) \nonumber . \end{aligned}$$

(14)

The expectations of all these sums for $i\ne j$ are null (since the character sums of nonzero linear functions are null), and we obtain:

$$\begin{aligned} \frac{1}{2n} \mathbb {E} \left( m- \sum _{1\le i\le m}(-1)^{a_i}\right) =\frac{1}{n} \mathbb {E} \left( w_H(a)\right) =\frac{w_H(a)}{n}. \end{aligned}$$

(15)

Example 1

For $n=4$, Lemma 1 gives $\min _{k_c \ne k_g} \kappa (k_c,k_g) = 0.25$ and for $w_H(a)=n$ we have $\kappa (k_c,k_g) = 1$, which means that the CPA distinguisher is not able to distinguish between these two hypotheses $k_g$ and $k_c$ (see Subsect. 2.2). Note that we give a more detailed discussion about the results and their ramifications in Sect. 5.

Lemma 2

For an (n, n)-function such that, on average, F(x) has a Hamming weight near that of x, more precisely, where $\sum _x |w_H(F(x))-w_H(x)|\le d_{w_H}$, where $d_{w_H}$ is some number, the standardized confusion coefficient is bounded above by $\frac{w_H(a)}{n}+ \frac{4d_{w_H}}{2^n} $.

Proof

If $\mathbb {E}(|w_H(F(x))-w_H(x)|)\le \frac{d_{w_H}}{2^n}$, then according to Lemma 1 and its proof the confusion coefficient $\kappa (k_c,k_g)=$ $\mathbb {E}\left( \left( \frac{w_H(F(x))-w_H(F(x+a))}{\sqrt{n}}\right) ^2\right) $ is such that

$$\begin{aligned} \left| \kappa (k_c,k_g)-\frac{w_H(a)}{n}\right|\le & {} \mathbb {E}\left( \left| \left( \frac{w_H(F(x))-w_H(F(x+a))}{\sqrt{n}}\right) ^2-\left( \frac{w_H(x)-w_H(x+a)}{\sqrt{n}}\right) ^2\right| \right) \\ {}= & {} \mathbb {E}\left( \left| \left( \frac{w_H(F(x))-w_H(F(x+a))}{\sqrt{n}}-\frac{w_H(x)-w_H(x+a)}{\sqrt{n}}\right) \right. \right. \\&\quad \quad \quad \quad \left. \left. \left( \frac{w_H(F(x))-w_H(F(x+a))}{\sqrt{n}}+\frac{w_H(x)-w_H(x+a)}{\sqrt{n}}\right) \right| \right) \\ {}\le & {} \frac{2}{n} \left( \max _{x\in \mathbb {F}_2^n} w_H(F(x))+\max _{x\in \mathbb {F}_2^n}w_H(x)\right) \mathbb {E}\left( \left| w_H(F(x))-w_H(x)\right| \right) \\ {}= & {} \frac{4d_{w_H}}{2^n}. \end{aligned}$$

3.2 Relation to Cryptographic Properties

We study the cryptographic consequences of the preservation of the Hamming weight. Again we first cover the specific case were the input and output of an S-box always have the same Hamming weight, and then the second case where the output has on average a Hamming weight close to that of the corresponding input (see Lemma 3).

If for every x, we have $w_H(F(x))=w_H(x)$ then the sum (mod 2) of all coordinate functions of F equals the sum (mod 2) of all coordinates of x. This means that F has nonlinearity equal to zero since one of its component functions is linear. Of course, the same happens under the much weaker hypothesis that $w_H(F(x))$ and $w_H(x)$ have always the same parity. Therefore, an S-box function preserving the Hamming weight is cryptographically insecure.

However, if $\sum _x |w_H(F(x))-w_H(x)|\le d_{w_H}$, then we have $nl \le d_{w_H}$. Indeed, this is a direct consequence of the following straightforward result, which has however much importance in our context:

Lemma 3

If the Hamming weight of the Boolean function:

$$x\mapsto (w_H(F(x))-w_H(x)) \ [mod\, 2],$$

that is, $\sum _x ((w_H(F(x))-w_H(x)) \ [mod\, 2])$, is at most $d_{w_H}$, then we have $nl \le d_{w_H}$.

Indeed, the Hamming distance between the component function $\sum _i F_i\ [mod\, 2])$ and the linear function $\sum _i x_i\ [mod\, 2])$ is then at most $d_{w_H}$.

Example 2

For a (4, 4)-function F to have nonlinearity equal to 4 (optimal nonlinearity), it means that $d_{w_H}$ must be at least 4. In order to construct functions with such properties, we ran a genetic algorithm as given by Picek et al. [17]. We use the same settings as there: 30 independent runs, population size equal to 50, 3-tournament selection, and mutation probability 0.3 per individual. The objective is the maximization of the following fitness function:

$$\begin{aligned} fitness = nl + \varDelta _{nl, 4}(n\times 2^n - \left| w_H(F(x))-w_H(x) \right| ). \end{aligned}$$

(16)

Here, $\varDelta _{nl, 4}$ represents the Kronecker delta function that equals 1 when nonlinearity is 4 and 0 otherwise. Notice we subtract the difference of the Hamming weights of the inputs and outputs of an S-box from the summed Hamming weight value for a (4, 4)-function since we work with the maximization problem while that value should be minimized. Interestingly, we observed that finding S-boxes with those properties is a relatively easy task and that the obtained S-boxes never have more than 8 fixed points. We give examples of such S-boxes in Table 1, for instance, $S_5$ where nonlinearity equals 4 and $d_{w_H}$ is 4.

Next, inspired by our empirical results, we investigate whether it is theoretically possible to construct an S-box with even more fixed points while still having the maximal nonlinearity.

Lemma 4

If an (n, n)-function has k fixed points then the maximal value of $W_F(a,v)$ when $v \ne 0$ is bounded below by $(k-1)/(1-2^{-n})$. If nl is the nonlinearity of an (n, n)-function, then its number k of fixed points is not larger than $2^n-\lceil (2-2^{1-n})\, nl\rceil $.

Proof

The number of fixed points k of an (n, n)-function F equals:

$$\begin{aligned} k = 2^{-n}\sum _{v\in \mathbb {F}_2^n} W_F(v,v) = 2^{-n} \sum _{x,v\in \mathbb {F}_2^n} (-1)^{v\cdot (x + F(x))}, \end{aligned}$$

(17)

which follows from Eq. (1) when $a = v$ and the property that $\sum _{v\in \mathbb {F}_ 2^n}(-1)^{v\cdot a}$ equals $2^n$ if $a=0$ and is null otherwise. The value of $W_F(0,0)$ involved in Eq. (17) equals $2^n$. We take it off and obtain:

$$\begin{aligned} k - 1 = 2^{-n}\sum _{v\in \mathbb {F}_ 2^{n*}} W_F(v,v). \end{aligned}$$

(18)

Then the arithmetic mean of $W_F(v,v)$ when $v \ne 0$ equals $(k-1)/(1-2^{-n})$. This implies that $\max _v W_F(v,v)$ is at least $(k-1)/(1-2^{-n})$ and the nonlinearity cannot be larger than $2^{n-1}-(k-1)/(2-2^{1-n})$. The inequality $nl\le 2^{n-1}-(k-1)/(2-2^{1-n})$ is equivalent to $k\le 2^n-\lceil (2-2^{1-n})\, nl\rceil $.

4 S-Boxes Minimizing the Hamming Distance

4.1 Relation to the Confusion Coefficient

In real world applications, the device may not only leak in the Hamming weight, but also in the Hamming distance, therefore we now extend our study to the case were the leakage arises from the Hamming distance between x and F(x). Again we first study the relation to the confusion coefficient and then give the connection to cryptographic properties.

By the triangular inequality, we have $|w_H(F(x))-w_H(x)|\le d_H(x,F(x))$. This implies that $\sum _x |w_H(F(x))-w_H(x)|\le \sum _x d_H(x,F(x))$.

Hence, if $\sum _x d_H(x,F(x))\le d_{d_H}$, we can use Lemma 2 and deduce that also in this scenario the confusion coefficient is bounded by $\frac{w_H(a)}{n}+\frac{4d_{d_H}}{2^n}$.

4.2 Relation to Cryptographic Properties

From $\sum _x d_H(x,F(x))\le d_{d_H}$, up to adding a linear function (which does not change the nonlinearity nor the differential uniformity), considering S-boxes such that, for every x, F(x) lies at a small distance from x corresponds to considering functions which take a too small number of values. We show that such functions have bad nonlinearity and bad differential uniformity.

Lemma 5

Let F be any (n, m)-function such that $|F(\mathbb {F}_2^n)| \le D$, then $\delta _F\ge \frac{2^n}{2^m-1}\left( \frac{2^n}{D}-1\right) $ and $nl\le 2^{n-1}-\frac{\frac{2^{n+m-1}}{D}-2^{n-1}}{2^m-1}$.

Proof

By using the Cauchy-Schwartz inequality, we obtain $\sum _{a\in \mathbb {F}_2^{n*}}|D_aF^{-1}(0)|=\sum _{b\in \mathbb {F}_2^m}|F^{-1}(b)|^2-2^n\ge \frac{(\sum _{b\in \mathbb {F}_2^m}|F^{-1}(b)|)^2}{D}-2^n=\frac{2^{2n}}{D}-2^n$, and there exists then $a\in \mathbb {F}_2^{m*}$ such that $|D_aF^{-1}(0)|\ge \frac{\frac{2^{2n}}{D}-2^n}{2^m-1}$. This proves the first assertion.

We have a partition of $\mathbb {F}_2^n$ into at most D parts by the preimages $F^{-1}(b)$, $b\in \mathbb {F}_2^m$, and there exists then $b\in \mathbb {F}_2^m$ such that $|F^{-1}(b)|\ge \frac{2^n}{D}$; for such b, we have $\sum _{x\in \mathbb {F}_2^n,v\in \mathbb {F}_2^m} (-1)^{v\cdot (F(x)+b)}\ge \frac{2^{n+m}}{D}$, which is equivalent to $\sum _{v\in \mathbb {F}_2^m, v\ne 0} (-1)^{v\cdot b} W_F(0,v) \ge \frac{2^{n+m}}{D}-2^n$, and then there exists $v\ne 0$ such that $|W_F(0,v)|\ge \frac{\frac{2^{n+m}}{D}-2^n}{2^m-1}$, which implies that $nl\le 2^{n-1}-\frac{\frac{2^{n+m-1}}{D}-2^{n-1}}{2^m-1}$. This proves the second assertion.

If D is small with respect to $2^m$ (so that $2^{n-1}$ is small with respect to $\frac{2^{n+m-1}}{D}$) and D is small with respect to $2^{n/2}$ (so that $\frac{2^n}{D}$ is large with respect to $2^{n/2}$), the nonlinearity is bad with respect to the covering radius bound $nl\le 2^{n-1}-2^{n/2-1}$. More precisely, if $D\le \frac{2^m}{\lambda }$ with $\lambda >1$, then $nl\le 2^{n-1}-\frac{(\lambda -1)2^{n-1}}{2^m-1}< 2^{n-1}-(\lambda -1)2^{n-m-1}$ and if $(\lambda -1)2^{n-m}$ is significantly larger than $2^{n/2}$, the nonlinearity is bad with respect to the covering radius bound. We have also that if D is small with respect to $2^m$ then $\delta _F$ is large with respect to $2^{n-m}$ if $m<n$ and with 2 if $m=n$ (which are the smallest possible values of $\delta _F$).

If F is an (n, n)-function and $x+F(x)$ has low weight for every x, say at most $t_{d_H}$, which is equivalent to saying that $d_H(x,F(x))\le t_{d_H}$ for every x, then its number of values is at most $D=\sum _{i=0}^{t_{d_H}} {n\atopwithdelims ()i}$ and we can apply the result above to $x+F(x)$, which has the same nonlinearity and the same $\delta _F$ as F. As far as we know, these observations are new. Note that we also have the possibility of applying Lemma 3 and then we have that nonlinearity is bounded by $t_{d_H}$.

Remark 3

Lemma 5 applies to the case when $d_H(x,F(x))\le t_{d_H}$ for every x where x equals $t\, \oplus \, k_g$. This represents a setting one would encounter when working for instance with software implementations. Now, if we consider a hardware setting (e.g., FPGA), then we are interested in the case $d_H(t,F(t \ \oplus k_g))\le t_{d_H}$ for every key. However, this case leads to the same observation as before but now with up to adding an affine function instead of up to adding a linear function as given in Lemma 5.

5 Side-Channel Evaluation

5.1 Evaluation of S-Boxes with (Almost) $w_H$ Preservation

As cryptographically non-optimal examples of S-boxes (almost) preserving $w_H$ we consider five different functions F: the identity mapping ($S_1$), F not Id but preserving $w_H$ ($S_2$), the identity mapping with an exchange of the images at position $x=3$ and $x=12$, i.e., $F(3)=12$ and $F(12)=3$, and as $w_H(3) = 2$ and $w_H(12) = 3$ we have $d_{w_H}=2$ (see Lemma 1) ($S_3$), $F(x) = 2^n-x$ which gives the complementary Hamming weight ($S_4$). Finally, we investigate four S-box functions $S_5$ to $S_8$ with the smallest possible distance $d_{w_H}$ that equals 4 and maximal possible nonlinearity equal to 4 (see Subsect. 3.2). S-box functions $S_7$ and $S_8$ have furthermore optimal differential uniformity (=4). The mappings are given in Table 1.

Table 1. Specifications of functions F, $(x) w_H\!(x)$

Full size table

The confusion coefficients are illustrated in Fig. 1. Note that, the distribution of $\kappa (k_c,k_g)$ is independent on the particular choice of $k_c$ (in the case there are no weak keys) and the values for $\kappa (k_c,k_g)$ are only permuted when choosing different value $k_c\in \mathbb F_2^n$. For our experiments we choose $k_c=0$ and furthermore we order $\kappa (k_c,k_g)$ in an increasing order of magnitude for illustrative purpose. The minimum value of $\kappa (k_c,k_g)$ for $k_g\ne k_c$ is highlighted with a red cross as it is one indicator of the side-channel resistance. Moreover, we mark $\kappa (k_c,k_g)=0$ or $\kappa (k_c,k_g)=1$ with a red circle which points out that CPA is not able to distinguish between $k_c$ and the marked $k_g$.

Figure 1a shows that, indeed, $k_c$ is indistinguishable from one key hypothesis $k_g$ if $w_H$ is preserved. Or in other words, even if knowing t and observing $w_H(F(t+k_c))+N$ with F equal to $S_1$ the attacker can not exclusively gain information about $k_c$ even if the number of measurements $m \rightarrow \infty $. Moreover, it confirms Lemma 1. Note that in our example $a=k_g$, thus $ \kappa (k_c,k_g) = \frac{w_H(k_g)}{4}$. Interestingly, when comparing our results to the study in [5], where the authors investigated (n, 1)-functions, we observe that the confusion coefficient takes different values which indeed confirms that the Hamming weight model is not a straightforward extension from 1-bit models. More precisely, in case of linear (n, 1)-function the authors observed that the confusion coefficient only takes values from {0,1}, whereas our examples illustrate (as well as our theoretical findings in Sect. 3) that the confusion coefficient is not restricted to only {0,1}, and is equal to 1 for only one particular $k_g$. Interestingly, for $d_{w_H}=2$ (see in Fig. 1b) we also have that $k_c$ is indistinguishable for one $k_g$. Moreover, apart from $ \kappa (k_c,k_g)=1$, only two different values are taken, each 7 times. This means that CPA is not able to distinguish between each of these 7 key guesses and in total only produces three different correlation values. When considering a complementary $w_H$ preservation (e.g. $4-w_H$) we achieve the same results as for $w_H$ preservation (see also Fig. 1).

Note that, while being illustrative, these first four examples of F are not cryptographically optimal and thus are not suitable in practice. We therefore constructed four S-boxes ($S_5$ to $S_8$) with the smallest $d_{w_H} (=4)$ while having optimal nonlinearity. Note that $S_5,S_6$ have suboptimal differential uniformity, while $S_7,S_8$ are cryptographically optimal (i.e. optimal nonlinearity and differential uniformity). Figures 1c to f show the confusion coefficient of $S_5$ to $S_8$. We can observe that all S-boxes have a very low minimum confusion coefficient that is even lower than for $S_1$ to $S_4$. Even more, as the previously investigated S-boxes, $S_5$ has $\kappa (k_c,k_g)=1$. Therefore, we find an S-box with almost Hamming weight preserving for which even with an infinity amount of traces the secret key cannot exclusively be found. As the minimum value of the confusion coefficient of $S_5$ is low (=0.125) there exists additionally other key hypotheses which are harder to distinguish from the secret key. As a conclusion we can say that indeed exact $w_H$ preserving results in a good side-channel resistance since we have $ \kappa (k_c,k_g)=1$. Moreover, when the $w_H$ is almost preserved we present here S-boxes which have a very low minimum confusion coefficient.

5.2 A Closer Look at the Confusion Coefficient

To understand the exact reason why some (one or more) key guesses result in a smaller confusion coefficient than others and how this is related to F, we concentrate on the connection between $k_c,k_g$, F, and $\kappa (k_c,k_g)$. Loosely speaking, we are iterating on key guesses influencing the input of F while calculating the confusion coefficient on the measured output of F and being interested in the properties of F. To better address these connections, we split the problem into 2 individual problems.

First, we take a deeper look at the input of F, i.e., $t \oplus k_g$ where $\forall t,k_g \in \mathbb F_2^n$ (see Eq. (8)). Clearly, due to the $\oplus $ operation a particular permutation for different key guesses $k_g$ is given. A 2-D representation for $t \oplus k_g$, where $k_g$ is on the horizontal and t on the vertical axis, is given in Fig. 2, where again, for simplicity reasons, $t,k_g \in \mathbb F_2^4$. In this figure we furthermore group $t \oplus k_g$ into 4 boxes ($n\times n$) together, each containing $4\times 4$ values: blue ($B_0$): $t \oplus k_g \in [0,3]$, yellow ($B_1$): $t \oplus k_g \in [4,7]$, green ($B_2$): $t \oplus k_g \in [8,11]$, and red ($B_3$): $t \oplus k_g \in [12,15]$. Using this color representation we can easily see 4 different permutations $ \pi _0, \pi _1, \pi _2, \pi _3$ applied on ($B_0$ $B_1$ $B_2$ $B_3$). More precisely, when considering a column representation^{Footnote 1} among the key guesses $k_g$, we have:

for $k_g \in [0,3]$: no permutation ($ \pi _0 = \bigl ({\begin{matrix} 0 &{} 1 &{} 2 &{} 3 \\ 0 &{} 1 &{} 2 &{} 3 \end{matrix}}\bigr )$),
for $k_g \in [4,7]$: pairwise swap of elements in each half of matrix ($ \pi _1 = \bigl ({\begin{matrix} 0 &{} 1 &{} 2 &{} 3 \\ 1 &{} 0 &{} 3 &{} 2 \end{matrix}}\bigr )$),
for $k_g \in [8,11]$: additionally reverse ordering of elements ($ \pi _2 = \bigl ({\begin{matrix} 0 &{} 1 &{} 2 &{} 3 \\ 2 &{} 3 &{} 0 &{} 1 \end{matrix}}\bigr )$),
for $k_g \in [12,15]$ additionally a pairwise swap of elements in each half of matrix ($ \pi _3 = \bigl ({\begin{matrix} 0 &{} 1 &{} 2 &{} 3 \\ 3 &{} 2 &{} 1 &{} 0 \end{matrix}}\bigr )$).

Moreover, as highlighted by the zoom in on each box, within each box (i.e., $B_i$, $0\le i \le 3$) we have the same permutations $ \pi _0,\ldots , \pi _3$ on the 4 column entries. Note that the order of permutations is equivalent for each box, or in other words, regardless of the color and position of the box the same permutation is applied. More formally, let $b_{ij} \in [4i,4i+3]^4$ (for $0 \le i,j \le 3$) denote the columns within $B_i$, then $b_{ij}$ equals $\pi _j$ applied on the column vector $(4i \ 4i+1 \ 4i+2 \ 4i+3)$.

Second, we examine the expression of the confusion coefficient in Eq. (11) itself. Recall from Eq. (8), $y_{k_g,t} = y(k_g,t) = w_H(F(k_g + t))$. Let

$$ y_{k_g} = (y(k_g,0),y(k_g,1), \ldots , y(k_g,2^n-1))$$

denote the vector of hypotheses for one key guess $k_g$ over all texts t. Referring to Fig. 2, $ y_{k_g}$ relates to one column before its application to F and $w_H$. The confusion coefficient can be rewritten as

$$\begin{aligned} \kappa (k_c,k_g) =\frac{1}{4} \Big \Vert y_{k_c} - y_{k_g} \Big \Vert _2^2 \end{aligned}$$

(19)

with $\Vert \cdot \Vert _2$ being the Euclidean norm. Let us recall that we are especially interested in $min_{k_g\ne k_c} \kappa (k_c,k_g)$. Moreover, the elements of $ y_{k_c} - y_{k_g}$ are in $[-4,4]$. Now, as Eq. (19) considers not only the difference but its squared values, we may conjecture that the minimum value is most likely reached when the elements of $ y_{k_c} - y_{k_g}$ are in $[-1,1]$, which is discussed in more detail and confirmed using several lightweight S-boxes in Appendix A. Roughly speaking, one difference of $\pm 2$ is equivalent to 4 changes with $\pm 1$ and so on.

Now let us put the observations of both parts together. Our previous findings about the permutations can be straightforwardly applied to the Hamming weight of the output of F. Let us assume w.l.o.g. $k_c = 0$, then for $k_g = 4i+j$ (with $0\le i,j \le 3$) we have

$$\begin{aligned} y_{k_g}&= \pi _i \begin{pmatrix} \pi _j\begin{bmatrix} y_{0,0} \\ y_{0,1} \\ \vdots \\ y_{0,3} \end{bmatrix}^T &{} \pi _j \begin{bmatrix}y_{0,4} \\ y_{0,5} \\ \vdots \\ y_{0,7} \end{bmatrix}^T &{} \pi _j \begin{bmatrix}y_{0,8}\\ y_{0,9}\\ \vdots \\ y_{0,11} \end{bmatrix}^T &{} \pi _j \begin{bmatrix}y_{0,12} \\ y_{0,13} \\ \vdots \\ y_{0,15} \end{bmatrix}^T \end{pmatrix}^T, \end{aligned}$$

(20)

with $ y_{0} = (y_{0,0}, y_{0,1}, \ldots , y_{0,15})$ and $(\cdot )^T$ denoting the transpose. Thus, we are looking for a function F such that the distance

$$\begin{aligned} \left\| y_{0} - \pi _i \begin{pmatrix} \pi _j\begin{bmatrix} y_{0,0} \\ y_{0,1} \\ \vdots \\ y_{0,3} \end{bmatrix}^T &{} \pi _j \begin{bmatrix}y_{0,4} \\ y_{0,5} \\ \vdots \\ y_{0,7} \end{bmatrix}^T &{} \pi _j \begin{bmatrix}y_{0,8}\\ y_{0,9}\\ \vdots \\ y_{0,11} \end{bmatrix}^T &{} \pi _j \begin{bmatrix}y_{0,12} \\ y_{0,13} \\ \vdots \\ y_{0,15} \end{bmatrix}^T \end{pmatrix}^T \right\| _2^2 \end{aligned}$$

(21)

is as small as possible for any $\pi _i,\pi _j \in \{\pi _0,\pi _1,\pi _2,\pi _3\}$.

This finding indicates that the order of the Hamming weight of the output of F plays a significant role. To be more precise, the minimum confusion coefficient may depend not only on the distribution of values along the 4 boxes (Example 4), but also on the order within each box (Example 5).

Example 4

Note that the elements of $y_{k_g}$ follow a binomial distribution due to the application of $w_H$. Therefore, 0 and 4 occur once, 1 and 3 occurs four times, and 2 six times. In order to reach a mininum squared Euclidean distance in Eq. (21) a natural strategy seems to be to distribute the values broadly among the 4 sets $[4i,4i+3]$ and to have a small difference between the values in one set. Let us consider the S-box of Midori [18] and Mysterion [19]. From Table 2 one can observe that for Midori we have the following sets: 2,2,3,2 – 3,3,4,3 – 1,2,1,2 – 0,1,1,2. So, the maximal distance between values is 2. Moreover, the first three sets only contain 2 different values and the last has 3. On the contrary, when looking at Mysterion (0,1,2,3 – 2,4,3,2 – 1,3,1,2 – 2,1,3,2), the structure looks less balanced. In particular, the maximal distance is 3 and we have always 3 different values within a set. When comparing the confusion coefficient in Fig. 3 we can observe that Midori has a much smaller minimum confusion coefficient and is thus more SCA resilient.

Table 2. Known S-boxes and one modification of KLEIN, $ (x) w_H\!(x)$

Full size table

Example 5

Let us consider the S-box of KLEIN [20] and a small modification ($S_9$) in which we swap F(1) with F(3) (see Table 2). Note that both functions consist of the same values among the sets: 3,1,2,2 – 1,4,3,0 – 2,2,1,2 – 1,3,3,2. For both $min_{k_g\ne k_c} \kappa (k_c,k_g)$ is reached for $k_g=11$, thus $\pi _1=2$ and $\pi _2=2$. However, as Fig. 3d shows, for KLEIN we have $min_{k_g\ne k_c} \kappa (k_c,k_g) = 0.125$, whereas $min_{k_g\ne k_c} \kappa (k_c,k_g) = 0.185$ for $S_9$, which relates to a squared Euclidean distance (see Eq. (21)) of 8 and 12, respectively.

Furthermore, in Appendix A we investigate several lightweight S-boxes in terms of minimum confusion coefficient and provide empirical evaluations. Note that, a preliminary study showing the difference of some lightweight S-boxes has been conducted in [21]^{Footnote 2}. Our extended results in Appendix A theoretically and empirically confirm [21]. Moreover, the appendix provides details about the minimum Euclidean distance and the permutations $\pi _i,\pi _j$. Additionally, we take a deeper look at the expression of $ y_{k_c} - y_{k_g}$ for the key hypothesis $k_g$ that results in the smallest confusion coefficient (i.e., $\arg \min _{k_c\ne k_g} \kappa (k_c,k_g)$). We discover that for $S_5$ and the S-box proposed in [17], which has optimal properties of the confusion coefficient while holding optimal differential properties, the difference $ \Vert y_{k_c} - y_{k_g}\Vert ^2_2$ has a special particular structure, which is not observed for any other investigated 4-bit S-box.

Concluding, we derived specific criteria influencing the side-channel resistance (in particular in Eq. (21) and our findings in Appendix A) that could be exploited to optimize and find S-boxes in terms of side-channels resistance in future work – especially when adapted for $n>4$.

6 Conclusions

In this paper, we prove a number of bounds between various cryptographic properties that can be related also with the side-channel resilience of a cipher. Our results confirm some well known intuitions that having an S-box more resilient against SCA will make it potentially more vulnerable against classical cryptanalyses. However, they also show that for the usual sizes of S-boxes, this weakening is moderate and trade-offs are then possible.

Since in this work we concentrated in our practical investigations on the Hamming weight model, in the future we plan to explore possible trade-offs for the Hamming distance model and to extend our (empirical) analysis to larger S-boxes using the theoretical findings in this paper.

Notes

1.
Note that we also have the same permutations on the row entries, however, we are interested in particular in a column representation as they reflect the key hypotheses.
2.
Note that, in [22] the authors compared S-boxes regarding another (not normalized) version of the confusion coefficient and derived that their version is not aligned with their empirical results.

References

Matsui, M., Yamagishi, A.: A new method for known plaintext attack of FEAL cipher. In: Rueppel, R.A. (ed.) EUROCRYPT 1992. LNCS, vol. 658, pp. 81–91. Springer, Heidelberg (1993). doi:10.1007/3-540-47555-9_7
Chapter Google Scholar
Biham, E., Shamir, A.: Differential cryptanalysis of DES-like cryptosystems. In: Menezes, A.J., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 2–21. Springer, Heidelberg (1991). doi:10.1007/3-540-38424-3_1
Chapter Google Scholar
Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets of Smart Cards (Advances in Information Security). Springer-Verlag New York Inc., Secaucus (2007)
MATH Google Scholar
Nikova, S., Rechberger, C., Rijmen, V.: Threshold implementations against side-channel attacks and glitches. In: Ning, P., Qing, S., Li, N. (eds.) ICICS 2006. LNCS, vol. 4307, pp. 529–545. Springer, Heidelberg (2006). doi:10.1007/11935308_38
Chapter Google Scholar
Heuser, A., Rioul, O., Guilley, S.: A theoretical study of kolmogorov-smirnov distinguishers. In: Prouff, E. (ed.) COSADE 2014. LNCS, vol. 8622, pp. 9–28. Springer, Cham (2014). doi:10.1007/978-3-319-10175-0_2
Google Scholar
Brier, E., Clavier, C., Olivier, F.: Correlation power analysis with a leakage model. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29. Springer, Heidelberg (2004). doi:10.1007/978-3-540-28632-5_2
Chapter Google Scholar
Fei, Y., Luo, Q., Ding, A.A.: A statistical model for DPA with novel algorithmic confusion analysis. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 233–250. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33027-8_14
Chapter Google Scholar
Carlet, C.: Vectorial boolean functions for cryptography. In: Crama, Y., Hammer, P.L. (eds.) Boolean Models and Methods in Mathematics, Computer Science, and Engineering, 1st edn, pp. 398–469. Cambridge University Press, New York (2010)
Chapter Google Scholar
Nyberg, K.: On the construction of highly nonlinear permutations. In: Rueppel, R.A. (ed.) EUROCRYPT 1992. LNCS, vol. 658, pp. 92–98. Springer, Heidelberg (1993). doi:10.1007/3-540-47555-9_8
Chapter Google Scholar
Chabaud, F., Vaudenay, S.: Links between differential and linear cryptanalysis. In: Santis, A. (ed.) EUROCRYPT 1994. LNCS, vol. 950, pp. 356–365. Springer, Heidelberg (1995). doi:10.1007/BFb0053450
Google Scholar
Nyberg, K.: Perfect nonlinear S-boxes. In: Davies, D.W. (ed.) EUROCRYPT 1991. LNCS, vol. 547, pp. 378–386. Springer, Heidelberg (1991). doi:10.1007/3-540-46416-6_32
Google Scholar
Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999). doi:10.1007/3-540-48405-1_25
Chapter Google Scholar
Gandolfi, K., Mourtel, C., Olivier, F.: Electromagnetic analysis: concrete results. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 251–261. Springer, Heidelberg (2001). doi:10.1007/3-540-44709-1_21
Chapter Google Scholar
Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets of Smart Cards. Springer, Heidelberg (2006). ISBN 0-387-30857-1. http://www.dpabook.org/
MATH Google Scholar
Guilley, S., Heuser, A., Rioul, O.: A key to success. In: Biryukov, A., Goyal, V. (eds.) INDOCRYPT 2015. LNCS, vol. 9462, pp. 270–290. Springer, Cham (2015). doi:10.1007/978-3-319-26617-6_15
Chapter Google Scholar
Thillard, A., Prouff, E., Roche, T.: Success through confidence: evaluating the effectiveness of a side-channel attack. In: Bertoni, G., Coron, J.-S. (eds.) CHES 2013. LNCS, vol. 8086, pp. 21–36. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40349-1_2
Chapter Google Scholar
Picek, S., Papagiannopoulos, K., Ege, B., Batina, L., Jakobovic, D.: Confused by confusion: systematic evaluation of DPA resistance of various S-boxes. In: Meier, W., Mukhopadhyay, D. (eds.) INDOCRYPT 2014. LNCS, vol. 8885, pp. 374–390. Springer, Cham (2014). doi:10.1007/978-3-319-13039-2_22
Google Scholar
Banik, S., Bogdanov, A., Isobe, T., Shibutani, K., Hiwatari, H., Akishita, T., Regazzoni, F.: Midori: a block cipher for low energy (extended version). Cryptology ePrint Archive, Report 2015/1142 (2015). http://eprint.iacr.org/
Journault, A., Standaert, F.X., Varici, K.: Improving the security and efficiency of block ciphers based on LS-designs. Codes Crypt. Des. 82(1–2), 495–509 (2016)
MathSciNet MATH Google Scholar
Gong, Z., Nikova, S., Law, Y.W.: KLEIN: a new family of lightweight block ciphers. In: Juels, A., Paar, C. (eds.) RFIDSec 2011. LNCS, vol. 7055, pp. 1–18. Springer, Heidelberg (2012). doi:10.1007/978-3-642-25286-0_1
Chapter Google Scholar
Heuser, A., Picek, S., Guilley, S., Mentens, N.: Side-channel analysis of lightweight ciphers: does lightweight equal easy? Cryptology ePrint Archive, Report 2017/261 (2017). http://eprint.iacr.org/2017/261
Lerman, L., Markowitch, O., Veshchikov, N.: Comparing Sboxes of ciphers from the perspective of side-channel attacks. IACR Cryptology ePrint Archive 2016/993 (2016)
Google Scholar
Daemen, J., Peeters, M., Assche, G.V., Rijmen, V.: Nessie proposal: the block cipher Noekeon. Nessie submission (2000). http://gro.noekeon.org/
Shibutani, K., Isobe, T., Hiwatari, H., Mitsuda, A., Akishita, T., Shirai, T.: Piccolo: an ultra-lightweight blockcipher. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 342–357. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23951-9_23
Chapter Google Scholar
Bogdanov, A., Knudsen, L.R., Leander, G., Paar, C., Poschmann, A., Robshaw, M.J.B., Seurin, Y., Vikkelsoe, C.: PRESENT: an ultra-lightweight block cipher. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 450–466. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74735-2_31
Chapter Google Scholar
Borghoff, J., et al.: PRINCE – a low-latency block cipher for pervasive computing applications. In: Wang, X., Sako, K. (eds.) ASIACRYPT 2012. LNCS, vol. 7658, pp. 208–225. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34961-4_14
Chapter Google Scholar
Zhang, W., Bao, Z., Lin, D., Rijmen, V., Yang, B., Verbauwhede, I.: RECTANGLE: a bit-slice lightweight block cipher suitable for multiple platforms. Sci. Chin. Inf. Sci. 58(12), 1–15 (2015)
Google Scholar
Beierle, C., Jean, J., Kölbl, S., Leander, G., Moradi, A., Peyrin, T., Sasaki, Y., Sasdrich, P., Sim, S.M.: The SKINNY family of block ciphers and its low-latency variant MANTIS. Cryptology ePrint Archive, Report 2016/660 (2016). http://eprint.iacr.org/2016/660
Standaert, F., Malkin, T., Yung, M.: A unified framework for the analysis of side-channel key recovery attacks (extended version). IACR Cryptology ePrint Archive 2006/139 (2006)
Google Scholar

Download references

Acknowledgments

This work has been supported in part by Croatian Science Foundation under the project IP-2014-09-4882. The parts of this work were done while the third author was affiliated with KU Leuven, Belgium.

Author information

Authors and Affiliations

Universities of Paris VIII and Paris XIII, LAGA, UMR 7539, CNRS, Saint-Denis, France
Claude Carlet & Stjepan Picek
CNRS/IRISA, Rennes, France
Annelie Heuser
Massachusetts Institute of Technology, CSAIL, Cambridge, USA
Stjepan Picek
Cyber Security Research Group, Delft University of Technology, Mekelweg 2, Delft, The Netherlands
Stjepan Picek

Authors

Claude Carlet
View author publications
You can also search for this author in PubMed Google Scholar
Annelie Heuser
View author publications
You can also search for this author in PubMed Google Scholar
Stjepan Picek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stjepan Picek .

Editor information

Editors and Affiliations

Hamburg University of Technology, Hamburg, Germany
Dieter Gollmann
Graduate School of Engineering, Osaka University, Suita, Osaka, Japan
Atsuko Miyaji
Department of Frontier Media Science, Meiji University, Tokyo, Japan
Hiroaki Kikuchi

A Investigation of Known S-Boxes

We already described properties of the S-boxes of KLEIN, Midori, and Mysterion showing that Midori and KLEIN have both $\min _{k_g \ne k_c} \kappa (k_c,k_g) = 0.125$, whereas for Mysterion it equals 0.3125. Thus, the side-channel resistance of Mysterion is much smaller than that of KLEIN and Midori. Table 3 shows properties of several well-known S-boxes, where $\pi _i$ and $\pi _j$ indicate the permutations (see Eq. (21)) for the smallest squared Euclidean distance ($\min \Vert \cdot \Vert ^2_2$) and thus smallest confusion coefficient ($\min \kappa (k_c,k_g)$).

Table 3. S-box properties of known ciphers

Full size table

Note that the squared Euclidean distance should not serve as a new metric as it is in direct relation with the confusion coefficient, but its stated values should rather provide information how far $y_{k_g}$ is apart from $y_{k_c}$ in terms of the squared Hamming weight values. Further, as used in [17], we give $var(\kappa (k_c,k_g))$, where the higher the variance, the higher the side-channel resistance. Finally, we specify the Hamming weights preserved (P$w_H$). One can observe that Piccolo has the highest minimum value of the confusion coefficient (and the highest minimum squared Euclidean norm) and thus its side-channel resistance is the lowest among the evaluated one. Next, there is Mysterion followed by SKINNY, RECTANGLE, PRESENT, and Midori 2 that all have the same minimum value of the confusion coefficient, but different variances thereof. Then, we have NOEKEON and PRINCE. The lowest minimum confusion coefficient is reached by KLEIN, Midori, and the S-box proposed in [17], which has been found under the constraint of optimal differential properties and the lowest confusion coefficient by using genetic algorithms. Interestingly for the latter one, Fig. 4 illustrates that for one key guess $\kappa (k_c,k_g)=1$, which we do not observe for any other known S-boxes with optimal differential properties. Moreover, it corresponds to the confusion coefficient of $S_5$.

Additionally, we take a deeper look at the expression of $ y_{k_c} - y_{k_g}$ for the key hypothesis $k_g$ that results in the smallest confusion coefficient and we are interested if the elements in $ \vert y_{k_c} - y_{k_g}\vert $ are in $[-1,1]$ (see remark in Subsect. 5.2). Our investigations show that this does not hold for S-boxes with $\kappa (k_c,k_g)\ge 0.25$, but for the ones which are most side-channel resistant. In particular, Midori 2, Mysterion, PRESENT, RECTANGLE, and SKINNY contain two absolute difference of 2 (resulting in a Euclidean distance of 4), whereas Piccolo even has 4 absolute differences of 2. However, we could not observe any absolute difference greater than 2. On the contrary KLEIN, Midori, NOEKEON, PRINCE, and the S-box in [17] only contain absolute differences of one, which is thus equivalent to the Euclidean distance.

When considering the sum of differences among the 4 sets $[4s,4s+3]$ for $0\le s \le 3$, we observed interesting distinctions. In particular, let us denote

$$\begin{aligned} \varDelta _s = \left\| \begin{bmatrix} y_{0,4s} \\ y_{0,4s+1} \\ \vdots \\ y_{0,4s+3} \end{bmatrix} - \pi _i \left( \pi _j\begin{bmatrix} y_{0,4s} \\ y_{0,4s+1} \\ \vdots \\ y_{0,4s+3} \end{bmatrix} \right) \right\| ^2_2, \end{aligned}$$

(22)

with $\pi _i$ and $\pi _j$ being the permutation resulting in the minimum confusion coefficient.

Table 4 highlights that only for the S-box in [17] we have the same difference among all four sets. Note that, in future work this property may additionally help to detect and find S-boxes with better side-channel resistance for $n>4$.

Table 4. $\varDelta $-property (Eq. (22)) of the most resilient known S-boxes

Full size table

Finally, an empirical evaluation of the studied S-boxes is given in Fig. 5. To be reliable we conducted 5 000 independent simulation experiments (SNR $=2$) with random secret keys $k_c$ and texts t. Figure 5a shows the first-order success rate (SR), i.e., the empirical probability that the correct secret key is exclusively found. As found due to the properties of the confusion coefficient, the S-box of Piccolo is the weakest, finding the correct key with a SR of 0.9 using 20 measurement traces, whereas KLEIN and Midori require around 35 and 40 traces to reach SR = 0.9. Since the S-box in [17] does not exclusively find the correct key and thus has a SR = 0, we additionally plot the guessing entropy [29] in Fig. 5b which confirms our findings that at least 2 key guesses have to be made.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carlet, C., Heuser, A., Picek, S. (2017). Trade-Offs for S-Boxes: Cryptographic Properties and Side-Channel Resilience. In: Gollmann, D., Miyaji, A., Kikuchi, H. (eds) Applied Cryptography and Network Security. ACNS 2017. Lecture Notes in Computer Science(), vol 10355. Springer, Cham. https://doi.org/10.1007/978-3-319-61204-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-61204-1_20
Published: 26 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61203-4
Online ISBN: 978-3-319-61204-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Trade-Offs for S-Boxes: Cryptographic Properties and Side-Channel Resilience

Abstract

Similar content being viewed by others

On the Construction of Side-Channel Attack Resilient S-boxes

Strong 8-bit Sboxes with Efficient Masking in Hardware

Strong 8-bit Sboxes with efficient masking in hardware extended version

1 Introduction

2 Preliminaries

2.1 Generalities on S-Boxes

2.2 Side-Channel Resistance

3 S-Boxes (Almost) Preserving the Hamming Weight

3.1 Relation to the Confusion Coefficient

Lemma 1

Proof

Example 1

Lemma 2

Proof

3.2 Relation to Cryptographic Properties

Lemma 3

Example 2

Lemma 4

Proof

4 S-Boxes Minimizing the Hamming Distance

4.1 Relation to the Confusion Coefficient

4.2 Relation to Cryptographic Properties

Lemma 5

Proof

Remark 3

5 Side-Channel Evaluation

5.1 Evaluation of S-Boxes with (Almost) \(w_H\) Preservation

5.2 A Closer Look at the Confusion Coefficient

Example 4

Example 5

6 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Investigation of Known S-Boxes

A Investigation of Known S-Boxes

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation