Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The Learning Parity with Noise (\(\mathsf {LPN}\)) problem can be seen as a noisy system of linear equations in the binary domain. More specifically, we have a secret s and an adversary that has access to an \(\mathsf {LPN}\) oracle which provides him tuples of uniformly distributed binary vectors \(v_i\) and the inner product between s and \(v_i\) to which some noise was added. The noise is represented by a Bernoulli variable with a probability \(\tau \) to be 1. The goal of the adversary is to recover the secret s. The \(\mathsf {LPN}\) problem is a particular case of the well-known Learning with Errors (\(\mathsf {LWE}\)) [33] problem where instead of working in \(\mathbb {Z}_2\) we extend the work to a ring \(\mathbb {Z}_q\).

The \(\mathsf {LPN}\) problem is attractive as it is believed to be resistant to quantum computers. Thus, it can be a good candidate for replacing the number-theoretic problems such as factorization and discrete logarithm (which can be easily broken by a quantum algorithm). Also, given its structure, it can be implemented in lightweight devices. The \(\mathsf {LPN}\) problem is used in the design of the HB-family of authentication protocols [10, 19, 23, 24, 26, 30] and several cryptosystems base their security on its hardness [1, 1416, 20, 25].

Previous Work. \(\mathsf {LPN}\) is believed to be hard. So far, there is no reduction from hard lattice problems to certify the hardness (like in the case of \(\mathsf {LWE}\)). Thus, the best way to assess its hardness is by trying to design and improve algorithms that solve it. Over the years, the \(\mathsf {LPN}\) problem was analyzed and there exist several solving algorithms. The first algorithm to target \(\mathsf {LPN}\) is the \(\mathsf {BKW}\) algorithm [6]. This algorithm can be described as a Gaussian elimination on blocks of bits (instead on single bits) where the secret is recovered bit by bit. Several improvements appeared afterwards [18, 28]. One idea that improves the algorithm is the use of the fast Walsh-Hadamard transform as we can recover several bits of the secret at once. In their work, Levieil and Fouque [28] provide an analysis with the level of security achieved by different \(\mathsf {LPN}\) instances and propose secure parameters. Using \(\mathsf {BKW}\) as a black-box, Lyubashevsky [29] presents an \(\mathsf {LPN}\) solving algorithm useful for the case when the number of queries is restricted to an adversary. The best algorithm to solve \(\mathsf {LPN}\) was presented at ASIACRYPT’14 [22] and it introduces the use of the covering codes to improve the performance. Some problems in the computation of complexities were reported [7, 36]. As discussed by Bogos et al. [7] and in the ASIACRYPT presentation [22], the authors used a too optimistic approximation for the bias introduced by their new reduction method, the covering codes. Some complexity terms are further missing (as discussed in Sect. 2.2) or are not in bit operations. Also, no method to construct covering codes were suggested. At EUROCRYPT’16, Zhang et al. [36] proposed a way to construct good codes by concatenating perfect codes and improved the algorithms. However, some other problem in complexities were reported [9]. The new \(\mathsf {LF(4)}\) reduction technique introduced by Zhang et al. [36] was also shown to be incorrect [9].

For the case when the secret is sparse, i.e. its Hamming weight is small, the classical Gaussian elimination proves to give better results [7, 8, 11].

The \(\mathsf {LPN}\) algorithms consist of two parts: one in which the size of the secret is reduced and one in which part of the secret is recovered. Once a part of the secret is recovered, the queries are updated and the algorithm restarts to recover the rest of the secret. When trying to recover a secret s of k bits, it is assumed that k can be written as \(a \cdot b\), for \(a,b \in \mathbb {N}\) (i.e. secret s can be seen as a blocks of b bits). Usually all the reduction steps reduce the size by b bits and the solving algorithm recovers b bits. While the use of the same parameter, i.e. b, for all the operations may be convenient for the implementation, we search for an algorithm that may use different values for each reduction step. We discover that small variations from the fixed b can bring important improvements in the time complexity of the whole algorithm.

Our Contribution. In this work we first analyze the existing \(\mathsf {LPN}\) algorithms and study the operations that are used in order to reduce the size of the secret. We adjust the expressions of the complexities of each step (as in some works they were underestimated in the literature). For instance, the results from Guo et al. [22] and Zhang et al.  [36] are displayed with corrections in Table 1.Footnote 1 (Details for this computation are provided as an additional material for this paper.)

Table 1. Time complexity to solve \(\mathsf {LPN}\) (in bit operations). These complexities are based on the formulas from our paper with the most favorable covering codes we constructed from our pool, with adjusted data complexity to reach a failure probability bounded by \(33\,\% \). Originally claimed complexities by [22, 36] are under parentheses.

Second, we improve the theory behind the covering code reduction and show the link with perfect and quasi-perfect codes. Using the average bias of covering codes allows us to use arbitrary codes and even random ones. Using the algorithm to construct optimal concatenated codes based on a pool of elementary ones allows us to improve complexities. (In Guo et al. [22], only a hypothetical code was assumed to be close to a perfect code; in Zhang et al. [36], only the concatenation of perfect codes are used; in Table 1, our computed complexities are based on the real codes that we built with our bigger pool to have a fair comparison.)

Third, we optimize the order and the parameters used by the operations that reduce the size of the secret such that we minimize the time complexity required. We design a “meta-algorithm” that combines the reduction steps and finds the optimal strategy to solve \(\mathsf {LPN}\). We automatize the process of finding \(\mathsf {LPN}\) solving algorithms, i.e. given a random \(\mathsf {LPN}\) instance, our algorithm provides the description of the steps that optimize the time complexity. In our formalization we call such algorithms “optimal chains”. We perform a security analysis of \(\mathsf {LPN}\) based on the results obtained by our algorithm and compare our results with the existing ones. We discover that we improve the complexity compared with the existing results  [7, 22, 28, 36], as shown in Table 1.

Preliminaries and Notations. Given a domain \(\mathcal {D}\), we denote by \(x \xleftarrow {U} \mathcal {D}\) the fact that x is drawn uniformly at random from \(\mathcal {D}\). By \(Ber_{\tau }\) we denote the Bernoulli distribution with parameter \(\tau \). By \(Ber_{\tau }^k\) we denote the binomial distribution with parameters k and \(\tau \). Let \(\langle \cdot , \cdot \rangle \) denote the inner product, \(\mathbb {Z}_2 = \{0,1\}\) and \(\oplus \) denote the bitwise XOR. The Hamming weight of a vector v is denoted by \(\mathsf {HW}(v)\).

Organization. In Sect. 2 we formally define the \(\mathsf {LPN}\) problem and describe the main tools used to solve it. We carefully analyze the complexity of each step and show in footnote where it differs from the existing literature. Section 3 studies the failure probability of the entire algorithm and validates the use of the average bias in the analysis. Section 4 introduces the bias computation for perfect and quasi-perfect codes. We provide an algorithm to find good codes. The algorithm that searches the optimal strategy to solve \(\mathsf {LPN}\) is presented in Sects. 5 and 6. We illustrate and compare our results in Sect. 7 and conclude in Sect. 8. We put in additional material details of our results: the complete list of the chains we obtain (for Tables 3 and 4), an example of complete solving algorithm, the random codes that we use for the covering code reduction, and an analysis of the results from [22, 36] to obtain Table 1.

2 \(\mathsf {LPN}\)

2.1 \(\mathsf {LPN}\) Definition

The \(\mathsf {LPN}\) problem can be seen as a noisy system of equations in \(\mathbb {Z}_2\) where one is asked to recover the unknown variables. Below, we present the formal definition.

Definition 1

( \(\mathsf {LPN}\) oracle). Let \(s \xleftarrow {U} \mathbb {Z}_2^k\), let \(\tau \in ] 0, \frac{1}{2} [\) be a constant noise parameter and let \(\mathsf {Ber}_{\tau }\) be the Bernoulli distribution with parameter \(\tau \). Denote by \(D_{s,\tau }\) the distribution defined as

$$\begin{aligned} \{ (v, c) \mid v \xleftarrow {U} \mathbb {Z}_2^k, c = \langle v,s \rangle \oplus d, d \leftarrow \mathsf {Ber}_{\tau } \} \in \mathbb {Z}_2^{k+1}. \end{aligned}$$

An \(\mathsf {LPN}\) oracle \(\mathcal {O}^{\mathsf {LPN}}_{s,\tau }\) is an oracle which outputs independent random samples according to \(D_{s,\tau }\).

Definition 2

(Search \(\mathsf {LPN}\) problem). Given access to an \(\mathsf {LPN}\) oracle \(\mathcal {O}^{\mathsf {LPN}}_{s,\tau }\), find the vector s. We denote by \(\mathsf {LPN}_{k,\tau }\) the \(\mathsf {LPN}\) instance where the secret has size k and the noise parameter is \(\tau \). Let \(k' \le k\). We say that an algorithm \(\mathcal {M}\) \((n,t,m,\uptheta ,k')\)-solves the search \(\mathsf {LPN}_{k,\tau }\) problem if

$$\begin{aligned} \Pr [ \mathcal {M}^{\mathcal {O}^{\mathsf {LPN}}_{s,\tau }}(1^k) = (s_1 \ldots s_{k'}) \mid s \xleftarrow {U} \mathbb {Z}_2^k ] \ge \uptheta , \end{aligned}$$

and \(\mathcal {M}\) runs in time t, uses memory m and asks at most n queries from the \(\mathsf {LPN}\) oracle.

Remark that we consider here the problem of recovering only a part of the secret. Throughout the literature this is how the \(\mathsf {LPN}\) problem is formulated. The reason for doing so is that the recovery of the first \(k'\) bits dominates the overall complexity. Once we recover part of the secret, the new problem of recovering a shorter secret of \(k-k'\) bits is easier.

The \(\mathsf {LPN}\) problem has a decisional form where one has to distinguish between random vectors of size \(k+1\) and the samples from the \(\mathsf {LPN}\) oracle. In this paper we are interested only in finding algorithms for the search version.

We define \(\delta = 1 - 2\tau \). We call \(\delta \) the bias of the error bit d. We have \(\delta = E((-1)^d)\), with \(E(\cdot )\) the expected value. We denote the bias of the secret bits by \(\delta _s\). As s is a uniformly distributed random vector, at the beginning we have \(\delta _s = 0\).

2.2 Reduction and Solving Techniques

Depending on how many queries are given from the \(\mathsf {LPN}\) oracle, the \(\mathsf {LPN}\) solving algorithms are split in 3 categories. With a linear number of queries, the best algorithms are exponential, i.e. with \(n = \Theta (k)\) the secret is recovered in \(2^{\Theta (k)}\) time  [31, 35]. Given a polynomial number of queries \(n = k^{1+\eta }\), with \(\eta > 0\), one can solve \(\mathsf {LPN}\) with a sub-exponential time complexity of \(2^{\mathcal {O}(\frac{k}{\log \log k})}\) [29]. When \(\tau = \frac{1}{\sqrt{k}}\) we can improve this result and have a complexity of \(e^{\frac{1}{2}\sqrt{k}(\ln k)^2+\mathcal {O}(\sqrt{k}\ln k)}\) [8]. The complexity improves but remains in the sub-exponential range with a sub-exponential number of queries. For this category, we have the \(\mathsf {BKW}\) [6], \(\mathsf {LF1}\), \(\mathsf {LF2}\) [28], \(\mathsf {FMICM}\) [18] and the covering code algorithm [22, 36]. All these algorithms solve \(\mathsf {LPN}\) with a time complexity of \(2^{\mathcal {O}(\frac{k}{\log k})}\) and require \(2^{\mathcal {O}(\frac{k}{\log k})}\) queries. In the special case when the noise is sparse, a simple Gaussian elimination can be used for the recovery of the secret  [7, 11]. \(\mathsf {LF2}\), covering code or the Gaussian elimination prove to be the best one, depending on the noise level  [7].

All these algorithms have a common structure: given an \(\mathsf {LPN}_{k,\tau }\) instance with a secret s, they reduce the original \(\mathsf {LPN}\) problem to a new \(\mathsf {LPN}\) problem where the secret \(s'\) is of size \(k' \le k\) by applying several reduction techniques. Then, they recover \(s'\) using a solving method. The queries are updated and the process is repeated until the whole secret s is recovered. We present here the list of reduction and solving techniques used in the existing \(\mathsf {LPN}\) solving algorithms. In the next section, we combine the reduction techniques such that we find the optimal reduction phases for solving different \(\mathsf {LPN}\) instances.

We assume for all the reduction steps that we start with n queries, that the size of the secret is k, the bias of the secret bits is \(\delta _s\) and the bias of the noise bits is \(\delta \). After applying a reduction step, we will end up with \(n'\) queries, size \(k'\) and biases \(\delta '\) and \(\delta _s'\). Note that \(\delta _s\) averages over all secrets although the algorithm runs with one target secret. As it will be clear below, the complexity of all reduction steps only depends on k, n, and the parameters of the steps but not on the biases. Actually, only the probability of success is concerned with biases. We see in Sect. 3 that the probability of success of the overall algorithm is not affected by this approach. Actually, we will give a formula to compute a value which approximates the average probability of success over the key based on the average bias.

We have the following reduction steps:

  • changes the secret distribution. In the formal definition of \(\mathsf {LPN}\), we take the secret s to be a random row vector of size k. When other reduction steps or the solving phase depends on the distribution of s, one can transform an \(\mathsf {LPN}\) instance with a random s to a new one where s has the same distribution as the initial noise, i.e. \(s \leftarrow \mathsf {Ber}_{\tau }^k\). The reduction performs the following steps: from the n queries select k of them: \((v_{i_1},c_{i_1}), \ldots , (v_{i_k},c_{i_k})\) where the row vectors \(v_{i_j}\), with \(1 \le j \le k\), are linearly independent. Construct the matrix M as \(M = [v_{i_1}^T \cdots v_{i_k}^T]\) and rewrite the k queries as \(sM+d'=c'\), where \(d' = (d_{i_1}, \ldots , d_{i_k})\). With the rest of \(n-k\) queries we do the following:

    $$\begin{aligned} c'_j = \langle v_j (M^T)^{-1} , c' \rangle \oplus c_j = \langle v_j (M^T)^{-1} , d' \rangle \oplus d_j = \langle v'_j , d' \rangle \oplus d_j \end{aligned}$$

    We have \(n-k\) new queries \((v'_j,c'_j)\) where the secret is now \(d'\). In Guo et al. [22], the authors use an algorithm which is inappropriately called “the four Russians algorithm” [2]. This way, the complexity should be of \(\mathcal {O} \left( \min _{\chi \in \mathbb {N}} \left( k n' \lceil \frac{k}{\chi } \rceil + k^3 + k \chi 2^{\chi } \right) \right) \).Footnote 2 Instead, the Bernstein algorithm [4] works in \(\mathcal {O} \left( \frac{n'k^2}{\log _2k-\log _2\log _2k}+k^2 \right) \). We use the best of the two, depending on the parameters. Thus, we have:

    figure a
  • was first used by the \(\mathsf {LF2}\) algorithm. The queries are grouped in equivalence classes according to the values on b random positions. In each equivalence class, we perform the xoring of every pair of queries. The size of the secret is reduced by b bits and the new bias is \(\delta ^2\). The expected new number of queries is \(E(\sum _{i<j} 1_{v_i \text{ matches } v_j \text{ on } \text{ the } b\text{-bit } \text{ block } }) = \frac{n(n-1)}{2^{b+1}}\) which improves previous resultsFootnote 3. When \(n \approx 1 + 2^{b+1}\), the number of queries are maintained. For \(n > 1 + 2^{b+1}\), the number of queries will increase.

    figure b
  • is a reduction used only by the \(\mathsf {BKW}\) algorithm. It consists in dropping all the queries that are not 0 on a window of b bits. Again, these b positions are chosen randomly. In average, we expect that half of the queries are 0 on a given position. For b bits, we expect to have \(\frac{n}{2^b}\) queries that are 0 on this window. The bias is unaffected and the secret is reduced by b bits.

    figure c

    The complexity of \(n (1 + \frac{1}{2} + \ldots + \frac{1}{2^{b-1}})=\mathcal {O}(n)\) comes from the fact that we don’t need to check all the b bits: once we find a 1 we don’t need to continue and just drop the corresponding query.

  • is a method used by the covering code algorithm presented in ASIACRYPT’14. In order to reduce the size of the secret, one uses a linear code \([k,k']\) (which is defined by \(\mathsf {params}\)) and approximates the \(v_i\) vectors to the nearest codeword \(g_i\). We assume that decoding is done in linear time for the code considered. (For the considered codes, decoding is indeed based on table look-ups.) The noisy inner product becomes:

    $$\begin{aligned} \begin{aligned} \langle v_i,s \rangle \oplus d_i&= \langle g_i'G,s \rangle \oplus \langle v_i -g_i ,s \rangle \oplus d_i \\&= \langle g_i', sG^T \rangle \oplus \langle v_i-g_i,s \rangle \oplus d_i \\&= \langle g_i' , s' \rangle \oplus d_i', \end{aligned} \end{aligned}$$

    where G is the generator matrix of the code, \(g_i = g_i' G\), \(s' = s G^T \in \{0,1\}^{k'}\) and \(d_i' = \langle v_i -g_i,s \rangle \oplus d_i\). We denote \(\mathsf {bc}=E((-1)^{\langle v_i-g_i,s\rangle })\) the bias of \(\langle v_i-g_i,s\rangle \). We will see in Sect. 4 how to construct a \([k,k']\) linear code making \(\mathsf {bc}\) as large as possible.

    Here, \(\mathsf {bc}\) averages the bias over the secret although s is fixed by . It gives the correct average bias \(\delta \) over the distribution of the key. We will see that it allows to approximate the expected probability of success of the algorithm.

    By this transform, no query is lost.

    figure d

    The way \(\delta _s'\) is computed is a bit more complicated than for the other types of reductions. However, \(\delta _s\) only plays a role in the reduction, and we will not consider algorithms that use more than one reduction.

It is easy to notice that with each reduction operation the number of queries decreases or the bias is getting smaller. In general, for solving \(\mathsf {LPN}\), one tries to lose as few queries as possible while maintaining a large bias. We will study in the next section what is a good combination of using these reductions.

After applying the reduction steps, we assume we are left with an \(\mathsf {LPN}_{k',\delta '}\) instance where we have \(n'\) queries. The original \(\mathsf {BKW}\) algorithm was using a final solving technique based on majority decoding. Since the \(\mathsf {LF2}\) algorithm, we use a better solving technique based on the Walsh Hadamard Transform (\(\mathsf {WHT}\)).

\(\mathsf {WHT}\) recovers a block of the secret by computing the fast Walsh Hadamard transform on the function \(f(x) = \sum _{i} 1_{v_i=x}(-1)^{\langle v_i, s \rangle \oplus d_i}\). The Walsh-Hadamard transform is

$$\begin{aligned} \hat{f}(\nu )= \sum _x (-1)^{\langle \nu , x \rangle } f(x)= \sum _i (-1)^{\langle v_i, s+\nu \rangle \oplus d_i} \end{aligned}$$

For \(\nu = s\), we have \(\hat{f}(s) = \sum _i (-1)^{d_i} \). For a positive bias, we know that most of the noise bits are set to 0. It is the opposite when the bias is negative. So, \(|\hat{f}(s)|\) is large and we suppose it is the largest value in the table of \(\hat{f}\). Using again the Chernoff bounds, we need to have \(n' = 8 \ln (\frac{2^{k'}}{\uptheta }) \delta '^{-2}\) [7] queries in order to bound the probability of guessing wrongly the \(k'\)-bit secret by \(\uptheta \). We can improve further by applying directly the Central Limit Theorem and obtain a heuristic bound \(\varphi (- \sqrt{\frac{n'}{2 \delta '^{-2} -1}}) \le 1 - (1 - \uptheta )^{\frac{1}{2^{k'}-1}}\), where \(\varphi (x) = \frac{1}{2} + \frac{1}{2} \mathsf {erf}(\frac{x}{\sqrt{2}})\) and \(\mathsf {erf}\) is the Gauss error function. We obtain that

$$\begin{aligned} \sqrt{n'} \ge -\sqrt{2\delta '^{-2} -1} \cdot \varphi ^{-1} \left( 1 - (1 - \uptheta )^{\frac{1}{2^{k'}-1}} \right) . \end{aligned}$$
(1)

We can derive the approximation of Selçuk [34] that \(n' \ge 4 \ln (\frac{2^{k'}}{\uptheta }) \delta '^{-2}\). We give the details of our results in Sect. 3. Complexity of the \(\mathsf {WHT}(k')\) is \(\mathcal {O}(k' 2^{k'}\frac{\log _2n'+1}{2} + k'n' )\) as we use the fast Walsh Hadamard TransformFootnote 4 \(^,\) Footnote 5.

figure e

Given the reduction and the solving techniques, an \(\mathsf {LPN}_{k,\tau }\) solving algorithm runs like this: we start with a k-bit secret and with n queries from the \(\mathsf {LPN}\) oracle. We reduce the size of the secret by applying several reduction steps and we end up with \(n'\) queries where the secret has size \(k'\). We use one solving method, e.g. the \(\mathsf {WHT}\), and recover the \(k'\)-bit secret with a probability of failure bounded by \(\uptheta \). We chose \(\uptheta = \frac{1}{3}\). We have recovered a part of the secret. To fully recover the whole secret, we update the queries and start another chain to recover more bits, and so on until the remaining \(k-k'\) bits are found. For the second part of the secret we will require for the failure probability to be \(\uptheta ^2\) and for the \(i^{th}\) part it will be \(\uptheta ^i\). Thus, if we recover the whole secret in i iterations, the total failure probability will be bounded by \(\uptheta + \uptheta ^2 + \cdots + \uptheta ^i\). Given that we take \(\uptheta = \frac{1}{3}\), we recover the whole secret with a success probability larger than \(50\,\%\). Experience shows that the time complexity for the first iteration dominates the total complexity.

As we can see in the formulas of each possible step, the computations of \(k'\), \(n'\), and of the complexity do not depend on the secret weight. Furthermore, the computation of biases is always linear. So, the correct average bias (over the distribution of the key made by the transform) is computed. Only the computation of the success probability is non-linear but we discuss about this in the next section. As it only matters in \(\mathsf {WHT}\), we will see in Sect. 3 that the approximation is justified.

3 On Approximating the Probability of Success

Approximating n by using Central Limit Theorem. In order to approximate the number of queries needed to solve the \(\mathsf {LPN}\) instance we consider when the Walsh Hadamard Transform fails to give the correct secret. We first assume that the bias is positive. We have a failure when for another \(\bar{s} \not = s\), we have that \(\hat{f}(\bar{s}) > \hat{f}(s)\). Following the analysis from [7], we let \(y = A'{\bar{s}}^T + {c'}^T\) and \(d' = A's^T + {c'}^T\). We have \(\hat{f}(\bar{s})=\sum _i(-1)^{y_i}=n'-2.\mathsf {HW}(y)\) and similarly, \(\hat{f}(s)=n'-2.\mathsf {HW}(d')\). So, \(\hat{f}(\bar{s}) > \hat{f}(s)\) translates to \(\mathsf {HW}(y) \le \mathsf {HW}(d')\). Therefore

$$\begin{aligned} \Pr [\hat{f}(\bar{s}) > \hat{f}(s)] = \Pr \left[ \sum _{i=1}^{n'} (y_i -d_i') \le 0 \right] . \end{aligned}$$

For each \(\bar{s}\), we take y as a uniformly distributed random vector and we let \(\delta '(s)\) be the bias introduce with a fixed s for \(d_i'\) (we recall that our analysis computes \(\delta '=E(\delta '(s))\) over the distribution of s). Let \(X_1, \ldots ,X_{n'}\) be random variable corresponding to \(X_i = y_i -d_i'\). Since \(E(y_i) = \frac{1}{2}\), \(E(d_i') = \frac{1}{2}-\frac{\delta '(s)}{2}\) and \(y_i\) and \(d_i'\) are independent, we have that \(E(X_i) = \frac{\delta '(s)}{2}\) and \(\mathsf {Var}(X_i) = \frac{2-\delta '(s)^2}{4}\). By using the Central Limit Theorem we obtain that

$$\begin{aligned} \Pr [ X_1 + \ldots + X_{n'} \le 0 ] \approx \varphi \left( Z(s) \right) \text { with } Z(s)=- \frac{ \delta '(s) }{\sqrt{2- \delta '(s)^{2}} }\sqrt{n'} \end{aligned}$$

where \(\varphi \) can be calculated by \(\varphi (x) = \frac{1}{2} + \frac{1}{2} \mathsf {erf}(\frac{x}{\sqrt{2}})\) and \(\mathsf {erf}\) is the Gauss error function. For \(\delta '(s)<0\), the same analysis with \(\hat{f}(\bar{s})<\hat{f}(s)\) gives the same result. Applying the reasoning for any \(s' \not = s\) we obtain that the failure probability is

$$\begin{aligned} {\begin{matrix} p(s) &{} = 1 - \left( 1 - \varphi (Z(s)) \right) ^{2^{k'}-1}, \text{ if } \delta '(s) > 0 \\ \text{ and } p(s) &{} = 1 - \frac{1}{2^{k'}}, \text{ if } \delta '(s) \le 0. \end{matrix}} \end{aligned}$$

We deduce the following (for \(\uptheta < \frac{1}{2}\))

$$\begin{aligned} p(s) \le \uptheta \Leftrightarrow \sqrt{n'} \ge - \sqrt{2\delta '(s)^{-2}-1} \varphi ^{-1} \left( 1 - (1-\uptheta )^{\frac{1}{2^{k'}-1}} \right) \text{ and } \delta '(s)>0 \end{aligned}$$

As a condition for our \(\mathsf {WHT}\) step, we adopt the inequality in which we replace \(\delta '(s)\) by \(\delta '\). We give a heuristic argument below to show that it implies \(E(p(s))\le \uptheta \), which is what we want.

Note that if we use the approximation \(\varphi \left( Z \right) \approx -\frac{1}{Z\sqrt{2\pi }} e^{-\frac{Z^2}{2}}\) for \(Z\rightarrow -\infty \), we obtain the condition \(n' \ge 2 (2 \delta '^{-2} -1) \ln (\frac{2^{k'}-1}{\uptheta })\). So, our analysis brings an improvement of factor two over the Hoeffding bound method used by Bogos et al. [7] that requires \(n' \ge 8 \delta '^{-2} \ln (\frac{2^{k'}}{\uptheta })\).

On the validity of the using the bias average. The above computation is correct when using \(\delta '(s)\) but we use \(\delta '=E(\delta '(s))\) instead. If no step is used, \(\delta '(s)\) does not depend on s and we do have \(\delta '(s)=\delta '\). However, when a is used, the bias depends on the secret which is obtained after the step. For simplicity, we let s denote this secret. The bias \(\delta '(s)\) is actually of form \(\delta '(s)=\delta ^{2^x}\mathsf {bc}(s)\) where x is the number of steps and \(\mathsf {bc}(s)\) is the bias introduced by depending on s. The values of \(\delta '(s)\), Z(s), and p(s) are already defined above. We define \(Z=-\frac{\delta '}{\sqrt{2-\delta '^{2}}} \sqrt{n'} \) and \(p=1-(1-\varphi (Z))^{2^{k'}-1}\). Clearly, E(p(s)) is the average failure probability over the distribution of the secret obtained after .

Our method ensures that \(\delta '=E(\delta '(s))\) over the distribution of s. Since \(\delta '\) is typically small (after a few steps, \(\delta ^{2^x}\) is indeed very small), we can consider Z(s) as a linear function of \(\delta '(s)\) and have \(E(Z(s))\approx Z\). This is confirmed by experiment. We make the heuristic approximation that

$$\begin{aligned} E\left( 1-\left( 1-\varphi (Z(s))\right) ^{2^{k'}-1}\right) \approx 1-\left( 1-\varphi (E(Z(s)))\right) ^{2^{k'}-1} \approx 1-(1-\varphi (Z))^{2^{k'}-1} \end{aligned}$$

So, \(E(p(s))\approx p\).Footnote 6

We did some experiments based on some examples in order to validate our heuristic assumption. Our results show indeed that \(E(Z(s))\approx Z\). There is a small gap between E(p(s)) and p but this does not affect our results. Actually, we are in a phase transition region so any tiny change in the value of \(n'\) makes E(p(s)) change a lot. We include our results in the additional material. Thus, ensuring that \(p\le \uptheta \) with the above analysis based on the average bias ensures that the expected failure probability to be bounded by \(\uptheta \).

We also observed that the reduction can introduce problems. More precisely, what can go wrong is that s can have, with a given probability, a negative \(\delta '(s)\) bias or a component in one of the concatenated codes giving a zero bias, making \(\mathsf {WHT}\) to fail miserably.

4 Bias of the Code Reduction

In this section we present how to compute the bias introduced by a . Recall that the reduction introduces a new noise:

$$\begin{aligned} \langle v_i,s \rangle \oplus d_i = \langle g'_i , s' \rangle \oplus \langle v_i -g_i,s \rangle \oplus d_i, \end{aligned}$$

where \(g_i=g'_iG\) is the nearest codeword of \(v_i\) and \(s' = s G^T\). Note that \(g_i\) is not necessarily unique, specially if the code is not perfect. We take \(g_i=\mathsf {Decode}(v_i)\) obtained from an arbitrary decoding algorithm. Then the noise \(\mathsf {bc}\) can be computed by the following formula:

$$\begin{aligned} \begin{aligned} \mathsf {bc}&= E((-1)^{\langle v_i-g_i,s \rangle }) = \sum _{e \in \{0,1\}^k} \Pr [v_i-g_i = e] E ((-1)^{\langle e,s \rangle }) \\&= \sum _{w=0}^{k} \sum _{\begin{array}{c} e \in \{0,1\}^k, \\ \mathsf {HW}(e)=w \end{array}} \Pr [v_i-g_i = e] \delta _s^w = E \left( \delta _s^{\mathsf {HW}(v_i-g_i)} \right) \end{aligned} \end{aligned}$$

for a \(\delta _s\)-sparse secret. (We recall that the reduction step randomizes the secret.) So, the probability space is over the distribution of \(v_i\) and the distribution of s. Later, we consider \(\mathsf {bc}(s)=E((-1)^{\langle v_i-g_i,s \rangle })\) over the distribution over \(v_i\) only. (In the work of Guo et al. [22], only \(\mathsf {bc}(s)\) is considered. In Zhang et al. [36], our \(\mathsf {bc}\) was also considered.) In the last expression of \(\mathsf {bc}\), we see that the ambiguity in decoding does not affect \(\mathsf {bc}\) as long as the Hamming distance \(\mathsf {HW}(v_i-\mathsf {Decode}(v_i))\) is not ambiguous. This is a big advantage of averaging in \(\mathsf {bc}\) as it allows to use non-perfect codes. From this formula, we can see that the decoding algorithm \(v_i \rightarrow g_i\) making \(\mathsf {HW}(v_i-g_i)\) minimal makes \(\mathsf {bc}\) maximal. In this case, we obtain

$$\begin{aligned} \mathsf {bc} = E \left( \delta _s^{d(v_i,C)}\right) , \end{aligned}$$
(2)

where C is the code and \(d(v_i,C)\) denotes the Hamming distance of \(v_i\) from C.

For a code C, the covering radius is \(\rho =\max _vd(v,C)\). The packing radius is the largest radius R such that the balls of this radius centered on all codewords are non-overlapping. So, the packing radius is \(R=\left\lfloor \frac{D-1}{2}\right\rfloor \) where D is the minimal distance. We further have \(\rho \ge \left\lfloor \frac{D-1}{2}\right\rfloor \). A perfect code is characterized by \(\rho =\left\lfloor \frac{D-1}{2}\right\rfloor \). A quasi-perfect code is characterized by \(\rho =\left\lfloor \frac{D-1}{2}\right\rfloor +1\).

Theorem 1

We consider a \([k,k',D]\) linear code C, where k is the length, \(k'\) is the dimension, and D is the minimal distance. For any integer r and any positive bias \(\delta _s\), we have

$$\begin{aligned} \mathsf {bc}\le 2^{k'-k}\sum _{w=0}^r\left( {k\atop w}\right) (\delta _s^w-\delta _s^{r+1}) +\delta _s^{r+1} \end{aligned}$$

where \(\mathsf {bc}\) is a function of \(\delta _s\) defined by (2). Equality for any \(\delta _s\) such that \(0<\delta _s<1\) implies that C is perfect or quasi-perfect. In that case, the equality is reached when taking the packing radius \(r=R=\left\lfloor \frac{D-1}{2}\right\rfloor \).

By taking r as the largest integer such that \(\sum _{w=0}^r\left( {k\atop w}\right) \le 2^{k-k'}\) (which is the packing radius \(R=\left\lfloor \frac{D-1}{2}\right\rfloor \) for perfect and quasi-perfect codes), we can see that if a perfect \([k,k']\) code exists, it makes \(\mathsf {bc}\) maximal. Otherwise, if a quasi-perfect \([k,k']\) code exists, it makes \(\mathsf {bc}\) maximal.

Proof

Let \(\mathsf {decode}\) be an optimal deterministic decoding algorithm. The formula gives us that

$$\begin{aligned} \mathsf {bc}= 2^{-k}\sum _{g\in C}\sum _{v\in \mathsf {decode}^{-1}(g)} \delta _s^{\mathsf {HW}(v-g)} \end{aligned}$$

We define \(\mathsf {decode}_w^{-1}(g)=\{v\in \mathsf {decode}^{-1}(g);\mathsf {HW}(v-g)=w\}\) and \(\mathsf {decode}_{>r}^{-1}(g)\) the union of all \(\mathsf {decode}_w^{-1}(g)\) for \(w>r\). For all r, we have

$$\begin{aligned}&\sum _{v\in \mathsf {decode}^{-1}(g)}\delta _s^{\mathsf {HW}(v-g)} \\ =&\sum _{w=0}^r\left( {k\atop w}\right) \delta _s^w +\sum _{w=0}^r\left( \#\mathsf {decode}_w^{-1}(g)-\left( {k\atop w}\right) \right) \delta _s^w +\sum _{w>r}\delta _s^w\#\mathsf {decode}_w^{-1}(g) \\ \le&\sum _{w=0}^r\left( {k\atop w}\right) \delta _s^w +\sum _{w=0}^r\left( \#\mathsf {decode}_w^{-1}(g)-\left( {k\atop w}\right) \right) \delta _s^w +\delta _s^{r+1}\#\mathsf {decode}_{>r}^{-1}(g) \\ \le&\sum _{w=0}^r\left( {k\atop w}\right) \delta _s^w +\delta _s^{r+1} \left( \#\mathsf {decode}^{-1}(g)-\sum _{w=0}^r\left( {k\atop w}\right) \right) \end{aligned}$$

where we used \(\delta _s^w\le \delta _s^{r+1}\) for \(w>r\), \(\#\mathsf {decode}_w^{-1}(g)\le \left( {k\atop w}\right) \) and \(\delta _s^w\ge \delta _s^{r+1}\) for \(w\le r\). We further have equality if and only if the ball centered on g of radius r is included in \(\mathsf {decode}^{-1}(g)\) and the ball of radius \(r+1\) contains \(\mathsf {decode}^{-1}(g)\). By summing over all \(g\in C\), we obtain the result.

So, the equality case implies that the packing radius is at least r and the covering radius is at most \(r+1\). Hence, the code is perfect or quasi-perfect. Conversely, if the code is perfect or quasi-perfect and r is the packing radius, we do have equality.     \(\square \)

So, for quasi-perfect codes, we can compute

$$\begin{aligned} \mathsf {bc}= 2^{k'-k}\sum _{w=0}^R\left( {k\atop w}\right) (\delta _s^w-\delta _s^{R+1}) +\delta _s^{R+1} \end{aligned}$$
(3)

with \(R=\left\lfloor \frac{D-1}{2}\right\rfloor \). For perfect codes, the formula simplifies to

$$\begin{aligned} \mathsf {bc}= 2^{k'-k}\sum _{w=0}^R\left( {k\atop w}\right) \delta _s^w \end{aligned}$$
(4)

4.1 Bias of a Repetition Code

Given a [k, 1] repetition code, the optimal decoding algorithm is the majority decoding. We have \(D=k\), \(k'=1\), \(R=\left\lfloor \frac{k-1}{2}\right\rfloor \). For k odd, the code is perfect so \(\rho =R\). For k even, the code is quasi-perfect so \(\rho =R+1\). Using (3) we obtain

$$ \mathsf {bc} = \left\{ \begin{array}{l l} \sum _{w=0}^{\frac{k-1}{2}} \frac{1}{2^{k-1}} \left( {\begin{array}{c}k\\ w\end{array}}\right) \delta _s^w &{} \quad \text {if}\ k\ \text {is odd }\\ \\ \sum _{w=0}^{\frac{k}{2}-1} \frac{1}{2^{k-1}} \left( {\begin{array}{c}k\\ w\end{array}}\right) \delta _s^w + \frac{1}{2^k} \left( {\begin{array}{c}k\\ k/2\end{array}}\right) \delta _s^{\frac{k}{2}}&\quad \text {if}\ k\ \text {is even} \end{array} \right. $$

We give below the biases obtained for some [k, 1] repetition codes.

[k, 1]

Bias

[1, 2]

\(\frac{1}{2} \delta _s + \frac{1}{2}\)

[3, 1]

\(\frac{3}{4} \delta _s + \frac{1}{4}\)

[4, 1]

\(\frac{3}{8} \delta _s^2 + \frac{1}{2}\delta _s + \frac{1}{8}\)

[5, 1]

\(\frac{5}{8}\delta _s^2 + \frac{5}{16} \delta _s + \frac{1}{16}\)

[6, 1]

\(\frac{5}{16} \delta _s^3 + \frac{15}{32} \delta _s^2 + \frac{3}{16} \delta _s + \frac{1}{32}\)

[7, 1]

\(\frac{35}{64} \delta _s^3 + \frac{21}{64} \delta _s^2 + \frac{7}{64} \delta _s + \frac{1}{64} \)

[8, 1]

\(\frac{35}{128} \delta _s^4 + \frac{7}{16} \delta _s^3 + \frac{7}{32} \delta _s^2 + \frac{1}{16} \delta _s + \frac{1}{128}\)

[9, 1]

\(\frac{63}{128} \delta _s^4 + \frac{21}{64} \delta _s^3 + \frac{9}{64} \delta _s^2 + \frac{9}{256} \delta _s + \frac{1}{256}\)

[10, 1]

\(\frac{63}{256} \delta _s^5 + \frac{105}{256} \delta _s^4 + \frac{15}{64} \delta _s^3 + \frac{45}{512} \delta _s^2 + \frac{5}{256} \delta _s + \frac{1}{512}\)

4.2 Bias of a Perfect Code

In previous work [22, 36], the authors assume a perfect code. In this case, \(\sum _{w=0}^R\left( {\begin{array}{c}k\\ w\end{array}}\right) =2^{k-k'}\) and we can use (4) to compute \(\mathsf {bc}\). There are not so many binary linear codes which are perfect. Except the repetition codes with odd length, the only ones are the trivial codes [kk, 1] with \(R=\rho =0\) and \(\mathsf {bc}=1\), the Hamming codes \([2^{\ell }-1, 2^{\ell } - \ell -1,3]\) for \(\ell \ge 2\) with \(R=\rho =1\), and the Golay code [23, 12, 7] with \(R=\rho =3\).

For the Hamming codes, we have

$$\begin{aligned} \mathsf {bc} = 2^{-\ell }\sum _{w=0}^1\left( {\begin{array}{c}2^\ell -1\\ w\end{array}}\right) \delta _s^w = \frac{1 + (2^{\ell } -1) \delta _s}{2^{\ell }} \end{aligned}$$

For the Golay code, we obtain

$$\begin{aligned} \mathsf {bc} = 2^{-11}\sum _{w=0}^3\left( {\begin{array}{c}23\\ w\end{array}}\right) \delta _s^w = \frac{1 + 23\delta _s + 253 \delta _s^2 + 1771 \delta _s^3}{2^{11}} \end{aligned}$$

Formulae (2), (3) and (4) for \(\mathsf {bc}\) are new. Previously [7, 22], the value \(\mathsf {bc}_w\) of \(\mathsf {bc}(s)\) for any s of Hamming weight w was approximated to

$$\begin{aligned} \mathsf {bc}_w= 1 - 2 \frac{1}{S(k,\rho )} \sum _{\begin{array}{c} i\le \rho , \\ i~\text{ odd } \end{array}} \left( {\begin{array}{c}w\\ i\end{array}}\right) S(k-w,\rho -i), \end{aligned}$$

where w is the Hamming weight of the k-bit secret and \(S(k',\rho )\) is the number of \(k'\)-bit strings with weight at most \(\rho \). Intuitively the formula counts the number of \(v_i-g_i\) that produce an odd number of xor with the 1’s of the secret. (See [7, 22].) So, Guo et al. [22] assumes a fixed value for the weight w of the secret and considers the probability that w is not correct. If w is lower, the actual bias is larger but if w is larger, the computed bias is overestimated and the algorithm fails.

For instance, with a [3, 1] repetition code, the correct bias is \(\mathsf {bc}=\frac{3}{4}\delta _s+\frac{1}{4}\) following our formula. With a fixed w, it is of \(\mathsf {bc}_w=1-\frac{w}{2}\) [7, 22]. The probability of w to be correct is \(\left( {\begin{array}{c}k\\ w\end{array}}\right) \tau ^w(1-\tau )^{k-w}\). We take the example of \(\tau =\frac{1}{3}\) so that \(\delta _s=\frac{1}{3}\).

w

\(\mathsf {bc}_w\)

\(\Pr [w]\)

\(\Pr [w],\tau = \frac{1}{3}\)

0

1

\((1-\tau )^3\)

0.2963

1

\(\frac{1}{2}\)

\(3\tau (1-\tau )^2\)

0.4444

2

0

\(3\tau ^2(1-\tau )\)

0.2222

3

\(-\frac{1}{2}\)

\(\tau ^3\)

0.0370

So, by taking \(w=1\), we have \(\delta =\mathsf {bc}_w=\frac{1}{2}\) but the probability of failure is about \(\frac{1}{4}\). Our approach uses the average bias \(\delta =\mathsf {bc}=\frac{1}{2}\).

4.3 Using Quasi-perfect Codes

If \(C'\) is a \([k-1,k',D]\) perfect code with \(k'>1\) and if there exists some codewords of odd length, we can extend \(C'\), i.e., add a parity bit and obtain a \([k,k']\) code C. Clearly, the packing radius of C is at least \(\left\lfloor \frac{D-1}{2}\right\rfloor \) and the covering radius is at most \(\left\lfloor \frac{D-1}{2}\right\rfloor +1\). For \(k'>1\), there is up to one possible length for making a perfect code of dimension \(k'\). So, C is a quasi-perfect, its packing radius is \(\left\lfloor \frac{D-1}{2}\right\rfloor \) and its covering radius is \(\left\lfloor \frac{D-1}{2}\right\rfloor +1\).

If \(C'\) is a \([k+1,k',D]\) perfect code with \(k'>1\), we can puncture it, i.e., remove one coordinate by removing one column from the generating matrix. If we chose to remove a column which does not modify the rank \(k'\), we obtain a \([k,k']\) code C. Clearly, the packing radius of C is at least \(\left\lfloor \frac{D-1}{2}\right\rfloor -1\) and the covering radius is at most \(\left\lfloor \frac{D-1}{2}\right\rfloor \). For \(k'>1\), there is up to one possible length for making a perfect code of dimension \(k'\). So, C is a quasi-perfect, its packing radius is \(\left\lfloor \frac{D-1}{2}\right\rfloor -1\) and its covering radius is \(\left\lfloor \frac{D-1}{2}\right\rfloor \).

Hence, we can use extended Hamming codes \([2^\ell ,2^\ell -\ell -1]\) with packing radius 1 for \(\ell \ge 3\), punctured Hamming codes \([2^\ell -2,2^\ell -\ell -1]\) with packing radius 0 for \(\ell \ge 3\), the extended Golay code [24, 12] with packing radius 3, and the punctured Golay code [22, 12] with packing radius 2.

There actually exist many constructions for quasi-perfect linear binary codes. We list a few in Table 2. We took codes listed in the existing literature [13, Table 1], [32, p. 122], [21, p. 47], [17, Table 1], [12, p. 313], and [3, Table 1]. In Table 2, k, \(k'\), D, and R denote the length, the dimension, the minimal distance, and the packing radius, respectively.

Table 2. Perfect and quasi-perfect binary linear codes

4.4 Finding the Optimal Concatenated Code

The linear code \([k,k']\) is typically instantiated by a concatenation of elementary codes for practical purposes. By “concatenation” of m codes \(C_1,\ldots ,C_m\), we mean the code formed by all \(g_{i,1}\Vert \cdots \Vert g_{i,m}\) obtained by concatenating any set of \(g_{i,j}\in C_j\). Decoding \(v_1\Vert \cdots \Vert v_m\) is based on decoding each \(v_{i,j}\) in \(C_j\) independently. If all \(C_j\) are small, this is done by a table lookup. So, concatenated codes are easy to implement and to decode. For \([k,k']\) we have the concatenation of \([k_1,k_1'], \ldots , [k_{m},k_m']\) codes, where \(k_1 + \cdots + k_{m} = k\) and \(k_1' + \cdots + k_{m}' = k'\). Let \(v_{ij}, g_{ij}, s'_j\) denote the \(j^{th}\) part of \(v_i,g_i,s'\) respectively, corresponding to the concatenated \([k_j,k_j']\) code. The bias of \(\langle v_{ij}-g_{ij},s_j \rangle \) in the code \([k_j,k_j']\) is denoted by \(\mathsf {bc}_j\). As \(\langle v_i-g_i,s \rangle \) is the xor of all \(\langle v_{ij}-g_{ij},s_j \rangle \), the total bias introduced by this operation is computed as \(\mathsf {bc} = \prod _{j=1}^{k'} \mathsf {bc}_j\) and the combination \(\mathsf {params}=( [k_1,k_1'], \ldots ,[k_{m},k_m'])\) is chosen such that it gives the highest bias.

The way these \(\mathsf {params}\) are computed is the following: we start by computing the biases for all elementary codes. I.e. we compute the biases for all codes from Table 2. We may add random codes that we found interesting. (For these, we use (2) to compute \(\mathsf {bc}\).)Footnote 7 Next, for each [ij] code we check to see if there is a combination of \([i-n,j-m]\), [nm] codes that give a better bias, where [nm] is either a repetition code, a Golay code or a Hamming code. We illustrate below the algorithm to find the optimal concatenated code. This algorithm was independently proposed by Zhang et al. [36] (with perfect codes only).

figure f

Using \(\mathcal {O}(k)\) elementary codes, this procedure takes \(\mathcal {O}(k^3)\) time and we can store all \(\mathsf {params}\) for any combination [ij], \(1 \le j < i \le k\) with \(\mathcal {O}(k^2)\) memory.

5 The Graph of Reduction Steps

Having in mind the reduction methods described in Sect. 2, we formalize an \(\mathsf {LPN}\) solving algorithm in terms of finding the best chain in a graph. The intuition is the following: in an \(\mathsf {LPN}\) solving algorithm we can see each reduction step as an edge from a \((k,\log _2{n})\) instance to a new instance \((k',\log _2{n'})\) where the secret is smaller, \(k' \le k\), we have more or less number of queries and the noise has a different bias. For example, a reduction turns an \((k,\log _2{n})\) instance with bias \(\delta \) into \((k',\log _2{n'})\) with bias \(\delta '\) where \(k' = k -b\), \(n' = \frac{n(n-1)}{2^{b+1}}\) and \(\delta ' = \delta ^2\). By this representation, the reduction phase represents a chain in which each edge is a reduction type moving from \(\mathsf {LPN}\) with parameters (kn) to \(\mathsf {LPN}\) with parameters \((k',n')\) and that ends with an instance \((k_i,n_i)\) used to recover the \(k_i\)-bit length secret by a solving method. The chain terminates by the fast Walsh-Hadamard solving method.

We formalize the reduction phase as a chain of reduction steps in a graph \(G = (V,E)\). The set of vertices V is composed of \(V = \{1,\ldots ,k\} \times L\) where L is a set of real numbers. For instance, we could take \(L=\mathbb {R}\) or \(L=\mathbb {N}\). For efficiency reasons, we could even take \(L=\{0, \ldots , \eta \}\) for some bound \(\eta \). Every vertex saves the size of the secret and the logarithmic number of queries; i.e. a vertex \((k,\log _2{n})\) means that we are in an instance where the size of the secret is k and the number of queries available is n. An edge from one vertex to another is given by a reduction step. An edge from \((k,\log _2{n})\) to a \((k',\log _2{n'})\) has a label indicating the type of reduction and its parameters (e.g. or ). This reduction defines some \(\alpha \) and \(\beta \) coefficients such that the bias \(\delta '\) after reduction is obtained from the bias \(\delta \) before the reduction by

$$\begin{aligned} \log _2{\delta '^2} = \alpha \log _2{\delta ^2} + \beta \end{aligned}$$

where \(\alpha ,\beta \in \mathbb {R}\).

We denote by \(\lceil \lambda \rceil _L\) the smallest element of L which is at least equal to \(\lambda \) and by \(\lfloor \lambda \rfloor _L\) the largest element of L which is not larger than \(\lambda \). In general, we could use a rounding function \(\mathsf {Round}_L(\lambda )\) such that \(\mathsf {Round}_L(\lambda )\) is in L and approximates \(\lambda \).

The reduction steps described in Subsect. 2.2 can be formalized as follows:

  • : \((k,\log _2{n}) \rightarrow (k,\mathsf {Round}_L(\log _2{(n -k)}))\) and \(\alpha = 0, \beta =0\)

  • : \((k,\log _2{n}) \rightarrow (k-b,\mathsf {Round}_L(\log _2 \left( \frac{n(n-1)}{2^{b+1}} \right) ))\) and \(\alpha = 2, \beta =0\)

  • : \((k,\log _2{n}) \rightarrow (k-b, \mathsf {Round}_L(\log _2{(\frac{n}{2^b})}))\) and \(\alpha = 1, \beta = 0\)

  • : \((k,\log _2{n}) \rightarrow (k',\log _2{n})\) and \(\alpha =1, \beta =\log _2{\mathsf {bc}^2}\), where \(\mathsf {bc}\) is the bias introduced by the covering code reduction using a \([k,k']\) linear code defined by \(\mathsf {params}\).

Below, we give the formal definition of a reduction chain.

Definition 3 (Reduction chain)

Let

for \(k,k',b \in \mathbb {N}\). A reduction chain is a sequence

$$\begin{aligned} (k_0, \log _2{n_0}) \xrightarrow {e_1} (k_1,\log _2{n_1}) \xrightarrow {e_2} \ldots \xrightarrow {e_i} (k_i,\log _2{n_i}), \end{aligned}$$

where the change \((k_{j-1},\log _2{n_{j-1}}) \rightarrow (k_j,\log _2{n_j})\) is performed by one reduction from \(\mathcal {R}\), for all \(0 < j \le i\).

A chain is simple if it is accepted by the automaton from Fig. 1.

Fig. 1.
figure 1

Automaton accepting simple chains

Remark: Restrictions for simple chains are modelled by the automaton in Fig. 1. We restrict to simple chains as they are easier to analyze. Indeed, is only used to raise \(\delta _s\) to make more effective. And, so far, it is hard to analyze sequences of steps as the first one may destroy the uniform and high \(\delta _s\) for the next ones. This is why we exclude multiple reductions in a simple chain. So, we use up to one reduction, always one before . And occurs before \(\delta \) decreases. For convenience, we will add a state of the automaton to the vertex in V.

Definition 4 (Exact chain)

An exact chain is a simple reduction chain for \(L = \mathbb {R}\). I.e. \(\mathsf {Round}_L\) is the identity function.

A chain which is not exact is called rounded.

For solving \(\mathsf {LPN}\) we are interested in those chains that end with a vertex \((k_i,\log _2{n_i})\) which allows to call a \(\mathsf {WHT}\) solving algorithm to recover the \(k_i\)-bit secret. We call these chains valid chains and we define them below.

Definition 5 (Valid reduction chain)

Let

$$\begin{aligned} (k_0, \log _2{n_0}) \xrightarrow {e_1} (k_1,\log _2{n_1}) \xrightarrow {e_2} \cdots \xrightarrow {e_i} (k_i,\log _2{n_i}) \end{aligned}$$

be a reduction chain with \(e_j=(\alpha _j,\beta _j,.)\). Let \(\delta _{j}\) be the bias corresponding to the vertex \((k_j,\log _2{n_j})\) iteratively defined by \(\delta _0=\delta \) and \(\log _2\delta _j^2=\alpha _j\log _2\delta _{j-1}^2+\beta _j\) for \(j=1,\ldots ,i\). We say the chain is a \(\varvec{\uptheta }\) -valid reduction chain if \(n_i\) satisfies (1) from p. 8 for \(\delta '=\delta _i\) and \(n'=n_i\).

The time complexity of a chain \((e_1, \ldots ,e_i)\) is simply the sum of the complexity of each reduction step \(e_1, e_2, \ldots ,e_i\) and \(\mathsf {WHT}\). We further define the max-complexity of a chain which is the maximum of the complexity of each reduction step and \(\mathsf {WHT}\). The max-complexity is a good approximation of the complexity. Our goal is to find a chain with optimal complexity. What we achieve is that, given a set L, we find a rounded chain with optimal max-complexity up to some given precision.

5.1 Towards Finding the Best \(\mathsf {LPN}\) Reduction Chain

In this section we present the algorithm that helps finding the optimal valid chains for solving \(\mathsf {LPN}\). As aforementioned, we try to find the valid chain with optimal max-complexity for solving an \(\mathsf {LPN}_{k,\tau }\) instance in our graph G.

The first step of the algorithm is to construct the directed graph \(G=(V,E)\). We take the set of vertices \(V = \{1,\ldots ,k\} \times L \times \{1,2,3,4\}\) which indicate the size of the secret, the logarithmic number of queries and the state in the automaton in Fig. 1. Each edge \(e \in E\) represents a reduction step and is labelled with the following information: \((k_1,\log _2{n_1},st) \mathop {\rightarrow }\limits ^{\alpha , \beta , t} (k_2,\log _2{n_2},st')\) where t is one of the reduction steps and \(\alpha \) and \(\beta \) save information about how the bias is affected by this reduction step.

The graph has \(\mathcal {O}(k\cdot |L|)\) vertices and each vertex has \(\mathcal {O}(k)\) edges. So, the size of the graph is \(\mathcal {O}(k^2\cdot |L|)\).

Thus, we construct the graph G with all possible reduction steps and from it we try to see what is the optimal simple rounded chain in terms of max-complexity. We present in Algorithm 2 the procedure to construct the graph G that contains all possible reduction steps with a time complexity bounded by \(2^{\eta }\) (As explained below, Algorithm 2 is not really used).

The procedure of finding the optimal valid chain is illustrated in Algorithm 3. The procedure of finding a chain with upper bounded max-complexity is illustrated in Algorithm 4.

figure g
figure h
figure i

Algorithm 4 receives as input the parameters k and \(\tau \) for the \(\mathsf {LPN}\) instance, the parameter \(\uptheta \) which represents the bound on the failure probability in recovering the secret. Parameter \(\eta \) represents an upper bound for the logarithmic complexity of each reduction step. Given \(\eta \), we build the graph G which contains all possible reductions with time complexity smaller than \(2^{\eta }\) (Step 4). Note that we don’t really call Algorithm 2. Indeed, we don’t need to store the edges of the graph. We rather keep a way to enumerate all edges going to a given vertex (in Step 11) by using the rules described in Algorithm 2.

For each vertex, we iteratively define \(\Delta ^{st}\) and \(\mathsf {Best}^{st}\), the best reduction step to reach a vertex and the value of the corresponding error bias. The best reduction step is the one that maximizes the bias. We define these values iteratively until we reach a vertex from which the \(\mathsf {WHT}\) solving algorithm succeeds with complexity bounded by \(2^\eta \). Once we have reached this vertex, we construct the chain by going backwards, following the \(\mathsf {Best}\) pointers.

We easily prove what follows by induction.

Lemma 1

At the end of the iteration of Algorithm 4 for \((j,\eta _2,st')\), \(\Delta ^{st'}_{j,\eta _2}\) is the maximum of \(\log _2\delta ^2\), where \(\delta \) is the bias obtained by an \(\mathsf {Round}_L\)-rounded simple chain from a vertex of form \((k,\eta _1,0)\) to \((j,\eta _2,st')\) with max-complexity bounded by \(2^{\eta }\) (\(\Delta _{j,\eta _2}^{st'} = - \infty \) if there is no such chain).

Lemma 2

If there exists a simple \(\mathsf {Round}_L\)-rounded chain c ending on state \((k_j,\eta _j,st_j)\) and max-complexity bounded by \(2^\eta \), there exists one \(c'\) such that \(\Delta _{i,\eta _i}^{st_i} = \log _2{\delta _i^2}\) at each step.

Proof

Let \(c''\) be a simple chain ending on \((k_j,\eta _j,st_j)\) with \(\Delta _{j\eta _j}^{st_j} = \log _2{\delta _j^2}\). Let \((k_{j-1},\eta _{j-1},st_{j-1})\) be the preceding vertex in \(c''\). We apply Lemma 2 on this vertex by induction to obtain a chain \(c'''\). Since the complexity of the last edge does not depend on the bias and \(\alpha \ge 0\) in the last edge, we construct the chain \(c'\), by concatenating \(c'''\) with the last edge of \(c''\).     \(\square \)

Theorem 2

Algorithm 4 finds a \(\uptheta \)-valid simple \(\mathsf {Round}_L\)-rounded chain for \(\mathsf {LPN}_{k, \tau }\) with max-complexity bounded by \( 2^{\eta }\) if there exists one.

Proof

We use Lemma 2 and the fact that increasing \(\delta ^2\) keeps constraint (1) valid.     \(\square \)

If we used \(L=\mathbb {R}\), Algorithm 4 would always find a valid simple chain with bounded max-complexity when it exists. Instead, we use rounded chains and hope that rounding still makes us find the optimal chain.

So, we build Algorithm 3. In this algorithm, we look for the minimal \(\eta \) for which Algorithm 4 returns something by a divide and conquer algorithm. First, we set \(\eta \) as being in the interval [0, k] where the solution for \(\eta =k\) corresponds to a brute-force search. Then, we cut the interval in two pieces and see if the lower interval has a solution. If it does, we iterate in this interval. Otherwise, we iterate in the other interval. We stop once the amplitude of the interval is lower than the requested precision. The complexity of Algorithm 3 is of \(\log _2\frac{k}{\mathsf {precision}}\) calls to Algorithm 4.

Theorem 3

Algorithm 3 finds a \(\uptheta \)-valid simple \(\mathsf {Round}_L\)-rounded chain for \(\mathsf {LPN}_{k, \tau }\) with parameter \(\mathsf {precision}\), with optimal rounded max-complexity, where the rounding function approximates \(\log _2\) up to \(\mathsf {precision}\) if there exists one.

Proof

Algorithm 3 is a divide-and-conquer algorithm to find the smallest \(\eta \) such that Algorithm 4 finds a valid simple \(\mathsf {Round}_L\)-rounded chain of max-complexity bounded by \(2^{\eta }\).     \(\square \)

We can see that the complexity of Algorithm 4 is of \(\mathcal {O}\left( k^2\cdot |L|\right) \) iterations as vertices have k possible values for the secret length and |L| possible values for the logarithmic number of equations. So, it is linear in the size of the graph. Furthermore, each type of edge to a fixed vertex has \(\mathcal {O}(k)\) possible origins. The memory complexity is \(\mathcal {O}\left( k\cdot |L|\right) \), mainly to store the \(\Delta _{k,\eta }\) and \(\mathsf {Best}_{k,\eta }\) tables. We also use Algorithm 1 which has a complexity \(\mathcal {O}(k^3)\) but we run it only once during precomputation. Algorithm 3 sets \(|L|\sim \frac{k}{\mathsf {precision}}\). So, the complexity of Algorithm 3 is \(\mathcal {O} \left( k^3+\frac{k^3}{\mathsf {precision}} \times \log {\frac{k}{\mathsf {precision}}} \right) \).

6 Chains with a Guessing Step

In order to further improve our valid chain we introduce a new reduction step to our algorithm. As it is done in previous works [5, 22], we guess part of the bits of the secret. More precisely, we assume that b bits of the secret have a Hamming weight smaller or equal to w. The influence on the whole algorithm is more complicated: it requires to iterate the \(\mathsf {WHT}\) step \(\sum _{i=0}^w\left( {\begin{array}{c}w\\ i\end{array}}\right) \) times. The overall complexity must further be divided by \(\sum _{i=0}^w\left( {\begin{array}{c}w\\ i\end{array}}\right) \left( \frac{1-\delta _s}{2}\right) ^i\left( \frac{1+\delta _s}{2}\right) ^{w-i}\). Note that this generalized step was used in Guo et al. [22].

We formalize this step as following:

  • guesses that b bits of the secret have a Hamming weight smaller or equal to w. The b positions are chosen randomly. The number of queries remains the same, the noise is the same and the size of the secret is reduced by b bits. Thus, for this step we have

    figure j

This step may be useful for a sparse secret, i.e. \(\tau \) is small, as then we reduce the size of the secret with a very small cost. In order to accommodate this new step we would have to add a transition from state 3 to state 3 in the automaton that accepts the simple chains (See Fig. 1).

To find the optimal chain using , we have to make a loop over all possible b and all possible w. We run the full search \(\mathcal {O}(k^2)\) times. The total complexity is thus \(\mathcal {O}\left( \frac{k^5}{\mathsf {precision}} \times \log {\frac{k}{\mathsf {precision}}} \right) \).

7 Results

We illustrate in this section the results obtained by running Algorithm 4 for different \(\mathsf {LPN}\) instances taken from Bogos et al. [7]. They vary from taking \(k=32\) to \(k=768\), with the noise levels: 0.05, 0.1, 0.125, 0.2 and 0.25. In Table 3 we display the logarithmic time complexity we found for solving \(\mathsf {LPN}\) without using .Footnote 8

Table 3. Logarithmic time complexity on solving \(\mathsf {LPN}\) without
Table 4. Logarithmic time complexity on solving \(\mathsf {LPN}\) with

Sequence of chains. If we analyze in more details one of the chains that we obtained, e.g. the chain for \(\mathsf {LPN}_{512,0.125}\), we can see that it first uses a . Afterwards, the secret is reduced by applying 5 times the and one at the end of the chain. With a total complexity of \(2^{79.46}\) and \(\uptheta < 33\,\%\) it recovers 64 bits of the secret.

The code used is a [189, 64] concatenation made of ten random codes: one instance of a [18, 6] code, five instances of a [19, 6] code, and four instances of a [19, 7] code. By manually tuning the number of equations without rounding, we can obtain with \(n=2^{63.299}\) a complexity of \(2^{78.84}\). This is the value from Table 1.

On the reduction. Our results show that the step does not bring any significant improvement. If we compare Table 3 with Table 4 we can see that in few cases the guess step improves the total complexity. For \(k \ge 512\), some results are not better than Table 3. This is most likely due to the lower precision used in Table 4.

We can see several cases where, at the end of a chain with , only one bit of the secret is recovered by \(\mathsf {WHT}\). If only 1 bit of the secret is recovered by non-bruteforce methods, the next chain for \(\mathsf {LPN}_{k-1,\tau }\) will have to be run several times, given the step used in the chain for \(\mathsf {LPN}_{k,\tau }\). Thus, it might happen that the first chain does not dominate the total complexity. So, our strategy to use sequences of chains has to be revised, but most likely, the final result will not be better than sequences of chains without . So, we should rather avoid these chains ending with 1 bit recovery.

There is no case where a without a chain ending with 1 bit brings any improvement.

Comparing the results. For practical values we compare our results with the previous work [7, 22, 28, 36].

From the work of ASIACRYPT’14 [22] and EUROCRYPT’16 [36] we have that \(\mathsf {LPN}_{512,0.125}\) can be solved in time complexity of \(2^{79.9}\) (with more precise complexity estimates). The comparison was shown in Table 1 in Introduction. We do better, provide concrete codes and we even remove the step with an optimized use of a code. Thus, the results of Algorithm 4 improve all the existing results on solving \(\mathsf {LPN}\).

8 Conclusion

In this article we have proposed an algorithm for creating reduction chains with the optimal max-complexity. The results we obtain bring improvements to the existing work and to our knowledge we have the best algorithm for solving \(\mathsf {LPN}_{512,0.125}\). We believe that our algorithm could be further adapted and automatized if new reduction techniques would be introduced.

As future works, we could look at applications to the \(\mathsf {LWE}\) problem. Kirchner and Fouque [27] improve the \(\mathsf {LWE}\) solving algorithms by refining the modulus switching. We could also look at ways to keep track of biases of secret bits bitwise, in order to allow cascades of steps.