Keywords

1 Introduction

Pseudorandom functions (PRF) over variable length inputs are keyed functions that take as input a bit string of arbitrary length and output a fixed length bit string that should be indistinguishable from uniformly random bits. This primitive is useful in practice as it can serve as a Message Authentication Code (MAC) in order to provide integrity and authenticity of messages. Moreover, when adequately combined with an encryption scheme (e.g. using the generic SIV structure [1]), it can also provide authenticated encryption. Unfortunately, barring a few examples like SURF [2], SipHash [3] and AES-PRF [4], building a concrete secure PRF from scratch has remained elusive.

Block Cipher-Based PRF: Given the ubiquity of block ciphers (BC), building a provably secure PRF from block ciphers has been a widely studied problem in symmetric cryptography. As far as fixed input length (FIL) is concerned, the problem is essentially solved as several highly secure constructions already exist. For example, given two n-bit permutations \(\mathsf {\Pi }_1\) and \(\mathsf {\Pi }_2\), the following PRP-to-PRF constructions offer security up to (roughly) \(2^n\) adversarial queries:

  • the sum \(x\mapsto \mathsf {\Pi }_1(x)\oplus \mathsf {\Pi }_2(x)\) of both permutations and its single-keyed variant the TWIN construction \(x\mapsto \mathsf {\Pi }_1(0||x)\oplus \mathsf {\Pi }_1(1||x)\): after their introduction by Bellare et al. [5], their security has been the subject of a long line of research [5,6,7], culminating with [8, 9] and [10] where optimal security has been proven;

  • the Encrypted Davies-Meyer (EDM) construction \(x\mapsto \mathsf {\Pi }_2(\mathsf {\Pi }_1(x)\oplus x)\) and its dual (EDMD) \(x\mapsto \mathsf {\Pi }_2(\mathsf {\Pi }_1(x))\oplus \mathsf {\Pi }_1(x)\): EDM has been introduced in [11], and security up to roughly \(2^n/n\) queries has been proven in [12], while EDMD has been designed and proven optimally secure in [12].

However, for the case of variable input length (VIL), very few constructions actually provide security beyond the birthday bound. The most notable exceptions are, the SUM-ECBC construction [13], the PMAC+ construction [14] and its single-key variant 1k-PMAC+  [15], 3kf9 [16] and LightMAC+  [17] since they offer beyond the birthday bound (but still suboptimal) security. Those modes of operations use the relatively new Double-block Hash-then-Sum or DbHtS paradigm [18], which applies n-bit block cipher calls to the two n-bit halves of a 2n-bit hash function and then sums the encrypted output. Although the DbHtS paradigm is known to achieve very high security [19, 20], it is not yet known whether it can achieve optimal security. A more traditional approach towards PRF construction is the classical Hash-then-PRF paradigm [21], that relies on an n-bit block cipher along with two other components:

  • a hash function with 2n-bit output; and

  • a 2n-bit to n-bit PRF.

Designing the latter primitive is deeply linked to the problem of domain extension for PRFs, which has also been the subject of a long line of research. Since we focus on the problem of designing an optimally secure construction from a block cipher, this restricts the set of possible finalization constructions to the Benes construction and its variants [22], and Feistel networks with at least four rounds [23]Footnote 1. Unfortunately, optimal security for Feistel networks when round functions are instantiated with PRPs still remains to be proven. Hence, using Feistel networks as a finalization function would require implementing the round PRFs as the xor of two permutations, thus increasing the number of block cipher calls to 8. As we will see, considering other structures will allow the design of more efficient schemes.

benes and Modified Benes: In [22], Aiello and Venkatesan introduced the Benes and modified Benes (or mBenes) constructions that build a 2n-bit to n-bit PRFFootnote 2 from respectively 6 and 4 independent n-bit PRFs, where each underlying PRF is called once for each call to the construction. Patarin showed that Benes transformation is n-bit secure [24]. For mBenes, although Aiello and Venkatesan conjecture n-bit security, until now only a high level proof idea is shown [24, 25] for security up to (roughly) \( 2^{n(1-\epsilon )} \) queries for all \( \epsilon > 0 \). In order to use PRPs as the underlying primitive in Benes and mBenes while keeping optimal security, the most obvious solution would be to rely on an optimally secure PRP-to-PRF conversion method. However, this would increase the number of PRP calls of the construction to 12 for the Benes construction, and 8 for the mBenes construction. Current proof techniques unfortunately are not sufficient to prove optimal security for PRP-based Benes and mBenes constructions using a smaller number of permutation calls. Indeed, the current best result by Jha and Nandi shows that mBenes using 4 block ciphers is secure up to \( 2^{3n/4} \) queries [19].

1.1 Our Contributions

Table 1. Summary of beyond-the-birthday bound secure variable input length pseudorandom functions. Here \(\ell \) denotes the length of the input message after padding.

Our contribution is twofold. First, we introduce a novel construction dubbed HtmB for Hash-then-modified-Benes. This construction captures the design of a VIL-PRF based on a FIL primitive where the input is first hashed, then given as input to mBenes. This hashing step is what allows us to avoid the main difficulties that are encountered when one tries to prove optimal security for the mBenes construction. In more details, we introduce a new statistical property for hash functions with 2n-bit outputs: Diblock Almost q-Collision-free Universality or \( \text {DbACU}_{q} \) (see Sect. 2.2). We then show that the composition of a \( \text {DbACU}_{q} \) hash function and the mBenes construction is n-bit secure (see Sect. 4), and propose several extensions:

  • HtmB-f: the standard HtmB construction based on 4 functions;

  • HtmB-p1: the HtmB construction where two functions are replaced with permutations, and the remaining ones are replaced with the sum of two permutations

  • HtmB-p2: the standard HtmB based on 4 permutations.

It is worth noting that the security proofs for the first two constructions are straightforward and rely on the same technique as Patarin’s classical proofs for Benes  [24]. The security proof for the last construction relies on the fundamental result of Mirror Theory [9, Theorem 6]. Note that \( \text {DbACU}_{q} \) can be easily achieved by concatenation of two independent almost universal (AU) hash functions. Moreover, we will show two instances where this property is also achieved for concatenation of dependent AU hash functions.

Second, we define two families of block cipher modes of operation dubbed mLightMAC+ and mPMAC+ (see Sect. 5). Both are concrete instantiations of HtmB where the hashing algorithm is based respectively on the LightMAC+ and PMAC+ algorithms. In more details, both schemes are provably secure PRFs with n-bit output and have the following properties:

  • mPMAC+ processes n bits of (padded) input per block cipher call during the hashing phase and is secure as long as the number of (padded) queried blocks is small in front of \(2^n\) and no query is longer than \(2^{n/2}\) blocks;

  • for any fixed integer \(m\in \{1,\ldots ,n-1\}\), mLightMAC+ processes \(n-m\) bits of input per block cipher call during the hashing phase and is secure as long as the number of adversarial queries is small in front of \(2^n\)Footnote 3 and no query is longer than \(2^m\) blocks.

Table 1 summarizes this information and compares our modes with the original LightMAC+ and PMAC+ constructions, while Fig. 1 highlights the changes between mPMAC+-p2, our mPMAC+ instantiation based on HtmB-p2, and the original PMAC+ construction.

In [26], Naito proposed a PMAC variant based on PMAC+ like masking and claimed length-independent bounds on the collision probability of the underlying hash layer. However, the proof is incorrect owing to a flaw identified in [27], and apparently it cannot be fixed within the proof setup developed in [26] (see [27] for further details). Consequently, in Sect. 6.2, we first discuss this flaw and then derive a slightly worse bound which is still sufficient to prove optimal security of mPMAC+.

The key sizes in HtmB could be an issue in some memory-constrained environments. In Sect. 7, we address this problem and present some variants of HtmB that require lesser key material. Finally, we conclude in Sect. 8 with some open problems.

Fig. 1.
figure 1

Schematic of \( \textsf {mPMAC+}\text {-}\textsf {p2} \), operating over a padded message of length \( \ell n \) bits. \( \mathsf {\Pi }_0,\ldots ,\mathsf {\Pi }_4 \) are independent random permutations, and , where \( \odot \) denotes the multiplication operator of \( \mathrm {GF}(2^{n})\). Components drawn in blue dashed lines represent the addition over the original PMAC+ construction. Components drawn in red dotted lines represent the deletion over the original PMAC+ construction. Note that the modified hash layer saves one block cipher call as compared to the one in PMAC+. (Color figure online)

2 Preliminaries

Notational Setup: For \( n \in \mathbb {N}\), [n] denotes the set \( \{1,2,\ldots ,n\} \), and \( \{0,1\}^n\) denotes the set of bit strings of length n. Let \(\textsf {GF}(2^n)\) be the field of order \(2^n\). We identify bit string and finite field element of \(\textsf {GF}(2^n)\) by representing the string \(a=a_{n-1}\ldots a_0 \in \{0,1\}^n\) as polynomial \(a(x) =a_{n-1}x^{n-1}+\ldots +a_0\in \textsf {GF}(2^n)\) and vice versa. As usual, we define field addition \(\oplus \) as polynomial addition, and multiplication \(\odot \) as polynomial multiplication modulo the irreducible polynomial f(x) used to represent \(\textsf {GF}(2^n)\). Therefore, we can view \(\{0,1\}^n\) as the finite field \(\textsf {GF}(2^n)\) with \(\oplus \) as field addition and \(\odot \) as field multiplication. When the context is clear, we will denote by 2 the primitive element of \(\textsf {GF}(2^n)\). The set of all bit strings (including the empty string) is denoted \( \{0,1\}^* \), and |X| denotes the number of bits in \( X \in \{0,1\}^* \). For any integer m, \(\{0,1\}^{\le m}\) denotes the set of all bit strings of bit length at most m. For \( n \in \mathbb {N}\) and any two bit strings M and \(M'\), we denote by \(M||M'\) the concatenation of M and \(M'\), and we define as \(M||10\cdots 0\), such that is the smallest multiple of n that is greater than |M|. For \(i,m \in \mathbb {N}\) such that \(i < 2^{m}\), we define \({<}i{>}_m\) as the m-bit little endian encoding of the integer i. For \( n,r \in \mathbb {N}\), such that \( 0 \le r \le n \), we define the falling factorial \( (n)_r := n!/(n-r)! = n(n-1)\cdots (n-r+1) \). The set of all functions from \( \mathcal {X}\) to \( \mathcal {Y}\) is denoted \( \mathcal {F}(\mathcal {X},\mathcal {Y}) \), and the set of all permutations of \( \mathcal {X}\) is denoted . We simply write \( \mathcal {F}(a,b) \) and , whenever \( \mathcal {X}= \{0,1\}^a \) and \( \mathcal {Y}= \{0,1\}^b \). For a finite set \( \mathcal {X}\), denotes the uniform at random sampling of X from \( \mathcal {X}\). For any property P of some random variable X, \( {\Pr _{ }\left[ P[X]\right] } \) denotes the probability that P[X] is satisfied.

For \( q \in \mathbb {N}\), \( X^q \) denotes the q-tuple \( (X_1,X_2,\ldots ,X_q) \). By an abuse of notation we also use \( X^q \) to denote the multiset \( \{X_i : i \in [q]\} \). For \( q \in \mathbb {N}\), for any set \( \mathcal {X}\), \( (\mathcal {X})_q \) denotes the set of all q-tuples with distinct elements from \( \mathcal {X}\). For a pair of tuples \( X^q \) and \( Y^q \), \( (X^q,Y^q) \) denotes the 2-ary q-tuple \( ((X_1,Y_1),\ldots ,(X_q,Y_q)) \). An n-ary q-tuple is defined analogously. For any tuple \( X^q \in \mathcal {X}^q \), and for any function \( f: \mathcal {X}\rightarrow \mathcal {Y}\), \( f(X^q) \) denotes the tuple \( (f(X_1),\ldots ,f(X_q)) \).

2.1 Keyed Functions and Block Ciphers

Keyed Function: A \( (\mathcal {K},\mathcal {X},\mathcal {Y}) \)-keyed function with key space \( \mathcal {K}\), domain \( \mathcal {X}\), and range \( \mathcal {Y}\) is a function . We write for .

Block Cipher: A \( (\mathcal {K},\{0,1\}^n) \)-block cipher \( {E}\) with key space \( \mathcal {K}\) and block space \( \{0,1\}^n\) is a \( (\mathcal {K},\{0,1\}^n,\{0,1\}^n) \)-keyed function, such that for any key \( K \in \mathcal {K}\), \( X \mapsto {E}(K,X) \) is a permutation of \( \{0,1\}^n\). We write \( {E}_K(X) \) for \( {E}(K,X) \).

Security Definitions: A (qt) -distinguisher is an interactive algorithm with access to an oracle, that makes at most q oracle queries, runs in time at most t, and outputs a single bit. By convention, \( t = \infty \) denotes computationally unbounded (information-theoretic) and deterministic distinguishers. In this paper, we assume that the distinguisher never makes a duplicate query.

Pseudorandom Function: The pseudorandom function or PRF advantage of any distinguisher \( \mathscr {A}\) against a \( (\mathcal {K},\mathcal {X},\mathcal {Y}) \)-keyed function is defined as

(1)

Deterministic message authentication codes (or MAC) are keyed functions which provide both integrity and authenticity of data. It is a well-known fact [28] that a secure PRF is a good candidate of deterministic MAC.

Pseudorandom Permutation: The pseudorandom permutation or PRP advantage of any distinguisher \( \mathscr {A}\) against a \( (\mathcal {K},\{0,1\}^n) \)-block cipher \( {E}\) is defined as

(2)

Remark 2.1

All our results will be given in the information-theoretic setting, and their computational counterparts can be easily obtained via a boilerplate hybrid argument. In other words, instead of first starting with block ciphers (or PRFs), we will directly work with random permutations (or functions) as the underlying primitives.

Sum of Permutations: In 1998, two independent works [5, 29] on building PRFs from PRPs proposed the Sum of Permutation (SoP) construction. For two independent random permutations , the SoP, denoted , is defined as the mapping . After several attempts [6, 7, 9], Dai et al. [10] finally showed that SoP is a secure PRF up to \( 2^n \) queries. In Proposition 2.1, we restate the well-known and celebrated result of [10]. A proof of Proposition 2.1 is available in [10].

Proposition 2.1

For \( n \ge 4 \), \( q \le 2^{n-4} \), and all \( (q,\infty ) \)-distinguisher \( \mathscr {A}\) we have

2.2 Universal Hash Functions

We recall the usual definition of universal hash function. A \( (\mathcal {K},\mathcal {X},\mathcal {Y}) \)-keyed function H is said to be \(\epsilon \)-almost universal (AU) hash function if for any distinct \( X,X' \in \mathcal {X}\), we have

(3)

Let us fix a non-empty set \(\mathcal {X}\subset \{0,1\}^{*}\). In this article, we are going to consider a slightly more general notion of universality. Namely, let H be a \((\mathcal {K}, \mathcal {X}, \mathcal {Y})\)-keyed function that processes its inputs in n-bit blocks. H is said to be \((q, \sigma , \epsilon )\)-Almost \( \theta \)-Collision-free Universal (or \( \text {ACU}_{\theta } \)) if, for every \(X^q\in (\mathcal {X})_q\) such that \(X^q\) contains at most \(\sigma \) blocks, one has \( {\Pr _{ }\left[ C \ge \theta \right] } \le \epsilon \), where

$$ C:=|\{(i,j)\,:\,1\le i < j \le q,\, H_K(X_i)=H_K(X_j)\}|. $$

In the case of a \((q, \sigma , \epsilon )\)-\( \text {ACU}_{1} \) hash function H, we simply say that H is \((q, \sigma , \epsilon )\)-AU. Note that if \(q=2\), we recover the standard AU notion. Moreover, the following proposition is a simple application of Markov’s inequality.

Proposition 2.2

For \( q,\theta \in \mathbb {N}\) and \( 0 \le \epsilon \le 1 \), let H be an \( \epsilon \)-AU hash function. Then H is \((q,\infty ,\frac{q^2\epsilon }{\theta })- \text {ACU}_{\theta } \).

The proof of Proposition 2.2 follows from Markov’s inequality and is thus skipped here.

We also define a new combined notion for the concatenation of two hash function. Namely, we say that a pair \(H = (H_1, H_2)\) of two \( (\mathcal {K},\mathcal {X},\mathcal {Y}) \)-keyed hash functions \( H_1, H_2 \) is \((q, \sigma , \epsilon _2, \epsilon _1)\)-Diblock \( \text {ACU}_{q} \) (or \( \text {DbACU}_{q} \)) if H is \((q, \sigma , \epsilon _2)\)-AU and \(H_1\), \(H_2\) are \((q, \sigma , \epsilon _1)\)-\( \text {ACU}_{q} \). A simple example of \( \text {DbACU}_{q} \) hash function is the concatenation of two independent AU hash functions. In section 5, we present two other \( \text {DbACU}_{q} \) hash functions LightHash and PHash based respectively on the LightMAC+ and PMAC+ constructions.

The concatenation of Two Independent AU Hash Functions: Let \(H_1\) and \(H_2\) be two \(\epsilon \)-AU hash functions with key space \(\mathcal {K}\), message space \(\mathcal {X}\) and range \(\mathcal {Y}\). We define the concatenation \(H=(H_1,H_2)\) of \(H_1\) and \(H_2\) as a \((\mathcal {K}^2,\mathcal {X},\mathcal {Y}^2)\)-keyed function defined as \(H_{(K_1,K_2)}(X)=(H_{1,K_1}(X),H_{2,K_2}(X))\) for every \(X\in \mathcal {X}\), \((K_1,K_2)\in \mathcal {K}^2\). The following result holds.

Proposition 2.3

Let \(H_1,H_2\) be two \(\epsilon \)-AU hash functions keyed independently and \(H=(H_1,H_2)\). For \(q,\sigma \in \mathbb {N}\), H is \((q,\sigma ,q^2\epsilon ^2,q\epsilon )\)-\( \text {DbACU}_{q} \).

A proof of Proposition 2.3 relies on the independence of both components and on Proposition 2.2.

2.3 Coefficient-H Technique

The coefficient-H technique by Patarin [30, 31] is a tool to upper bound the distinguishing advantage of any deterministic and computationally unbounded distinguisher \( \mathscr {A}\) in distinguishing the real oracle \( \mathcal {R}\) from the ideal oracle \( \mathcal {I}\). The collection of all queries and responses that \( \mathscr {A}\) made and received to and from the oracle, is called the transcript of \( \mathscr {A}\), denoted as \( \tau \).

Let \( \mathsf {\mathbb {T}_{re}}\) and \( \mathsf {\mathbb {T}_{id}}\) denote the transcript random variable induced by \( \mathscr {A}\)’s interaction with \( \mathcal {R}\) and \( \mathcal {I}\), respectively. Let \( \mathcal {T}\) be the set of all transcripts. A transcript \( \tau \in \mathcal {T}\) is said to be attainable if \( {\Pr _{ }\left[ \mathsf {\mathbb {T}_{id}}= \tau \right] } > 0 \), i.e., it can be realized by \( \mathscr {A}\)’s interaction with \( \mathcal {I}\). Following these notations, we state the main result of coefficient-H technique in Theorem 2.1. A proof of this theorem is available in [4, 32], among others.

Theorem 2.1

For \( \epsilon _1,\epsilon _2 \ge 0 \), suppose there is a set , that we call the set of bad transcripts, such that the following conditions hold:

  • ; and

  • For any , \( \tau \) is attainable and \( \displaystyle \frac{{\Pr _{ }\left[ \mathsf {\mathbb {T}_{re}}=\tau \right] }}{{\Pr _{ }\left[ \mathsf {\mathbb {T}_{id}}=\tau \right] }} \ge 1-\epsilon _2 \).

Then, for any computationally unbounded and deterministic distinguisher \( \mathscr {A}\), we have

$$\begin{aligned} \mathbf {Adv}_{\mathcal {R};\mathcal {I}}(\mathscr {A}) \le \epsilon _1 + \epsilon _2. \end{aligned}$$

3 Benes and mBenes Transformations

Butterfly transformation: Given four functions \( f_1,\ldots ,f_4 \in \mathcal {F}(n,n) \), the Butterfly transformation (illustrated in Fig. 2) is a function from \( \{0,1\}^{2n} \) to \( \{0,1\}^{2n} \), which is defined as \( \textsf {Butterfly}[f_1,\ldots ,f_4](L,R) := (X,Y) \), where

Benes transformation: Given eight functions \( f_1,\ldots ,f_8 \in \mathcal {F}(n,n) \), the Benes transformation (illustrated in Fig. 2) is a function from \( \{0,1\}^{2n} \) to \( \{0,1\}^{2n} \), which is defined as the composition of two Butterfly transformations, i.e. \( \textsf {Benes} [f_1,\ldots ,f_8](L,R) := (S,T) \), where

Modified Benes transformation: The modified Benes or mBenes transformation (illustrated in Fig. 2) is a simplification of the Benes transformation, where \( f_2 \) and \( f_3 \) are identity functions. So, we have , , and \( (S,T) = \textsf {mBenes} [f_1,f_4,f_5,\ldots ,f_8](L,R) \), such that and .

For brevity we drop the parameters \( f_1,\ldots ,f_8 \), whenever they are understood from the context.

Fig. 2.
figure 2

Left to right: Butterfly, Benes and mBenes transformations. An edge (uv) with label g denotes the mapping \( v = g(u) \). Unlabelled edges are identity mapping.

3.1 Revisiting the Security Analysis of Benes and mBenes

Let \( (L^q,R^q) \) denote a q-tuple of inputs. Given \( f_1,\ldots ,f_4 \in \mathcal {F}(n,n) \), we can define \( (X^q,Y^q) \) by the definition of Benes or mBenes, as applicable.

Dependency Graph: To \( (L^q,R^q) \) and any \( f_1,\ldots ,f_4 \in \mathcal {F}(n,n) \), we associate the dependency graph \( \mathcal {G}[L^q,R^q;f_{1,\ldots ,4}] = ([q],\mathcal {E}) \), over the set of all query indices [q], where \( \{i,j\} \in \mathcal {E}\) if and only if \( X_i = X_j \) (the edge is colored red) or \( Y_i = Y_j \) (the edge is colored blue). \( \mathcal {G}[L^q,R^q;f_{1,\ldots ,4}] \) may contain parallel edges, but their coloring will be different. Figure 3 is a possible dependency graph for \( q=12 \).

Fig. 3.
figure 3

A possible dependency graph for some 12-tuple of inputs. (Color figure online)

Definition 3.1

(Alternating cycle). An alternating cycle or circle of length \( k \ge 2 \), k even, is simply a cycle denoted by a sequence of \( k+1 \) indices, \( v^{k+1} = (v_1,\ldots ,v_{k},v_{k+1}) \) such that

  • \( v_{k+1} = v_1 \),

  • \( \{v_i,v_{i+1}\} \in \mathcal {E}\) for all \( i \in [k] \),

  • \( \{v_1,v_2\} \) is colored red, and

  • \( \{v_i,v_{i+1}\} \) and \( \{v_{i+1},v_{i+2}\} \) do not share the same color, for all \( i \in [k-1] \).

Example 3.1

Any parallel edge is an example of alternating cycle. In Fig. 3, (1, 2, 3, 4, 1) and (11, 12, 11) are two possible alternating cycles.

Let denote the property that \( \mathcal {G}[L^q,R^q;f_{1,\ldots ,4}] \) contains an alternating cycle. We will drop the parameters \( (L^q,R^q;f_{1,\ldots ,4}) \), whenever they are understood from the context.

For , Aiello and Venkatesan [22] showed that PRF advantage of any distinguisher against Benes and mBenes is at most the probability that is satisfied. Similar results were later also shown in [24, 25]. Theorem 3.1 is a reformulation of [22, Lemma 2] (also [25, Theorem 5.2] and [24, Theorem 1]) in our notations.

Theorem 3.1

For , , and any \( (q,\infty ) \)-distinguisher \( \mathscr {A}\), we have

A proof of Theorem 3.1 is available in [25] among others. For the sake of completeness, we reproduce it in the full version of this paper.

Aiello and Venkatesan [22] claimed that \( \mathrm {ACP}(q) \le q^2/2^{2n} \). Later, Patarin and Montreuil [25] showed that the initial analysis of \( \mathrm {ACP}(q) \) by Aiello and Venkatesan was overly optimistic, and subsequently gave a non-tight estimate for Benes. The main idea of their analysis was to consider each equation in the alternating cycle, one-by-one, distinguishing whether the equation is dependent over the previous equations or not. If the i-th equation is independent then they freely choose the new indexFootnote 4, i.e., \( (i+1) \)-th index in \( q-i \) ways. However, when the equation is dependent, then there exist \( j,j' < i \) such that \( L_i = L_j \) and \( R_i = R_{j'} \), hence we only have \( i(i-1) \) ways to choose the \( (i+1) \)-th index. By continuing in this way and making some algebraic simplifications, they derive the upper bound

$$\begin{aligned} \mathrm {ACP}(q) \le d(k)\frac{q^2}{2^{2n+1}} + \frac{q^4}{2^{4n+2}} + \frac{q^{k+1}}{2^{nk}}, \end{aligned}$$

for all \( k \ge 1 \), where \( d(k) = 6.5 + \sum _{j=6}^{k}j^{2j}+k^{2k} \). So, for any k and sufficiently large n, we can claim security up to \( q \le \min \{2^{nk/k+1},\sqrt{2^{2n}/d(k)}\} \). However, the bound becomes increasingly moot as we increase the value of k. Suppose we aim for security up to \( 2^{kn/k+1} \) queries. Then, for \( k = 6 \) we need \( n > 112 \), for \( k = 7 \) we need \( n > 161 \), and for \( k = 9 \) we need \( n > 290 \), where n denotes the output size of the underlying functions. Clearly, very high security (close to 0.9n) is only possible for large output size (\( n > 290 \)). In practice, with such a large output size, even a birthday bound security guarantee might suffice.

Patarin and Montreuil also claimed similar security bounds for mBenes [25]. However, they only gave a very high level and terse sketch of the proof. We refer the readers to [25] for details.

First Dependency and Tight Bound for Benes: Patarin [24] devised an elegant way to derive a more tighter estimate for \( \mathrm {ACP}(q) \) in case of Benes.

Definition 3.2

(Alternating trail). An alternating trail or line of length \( k \ge 2 \) is simply a trail denoted by a sequence of \( k+1 \) vertices, \( v^{k+1} = (v_1,\ldots ,v_{k},v_{k+1}) \) such that

  • \( \{v_i,v_{i+1}\} \in \mathcal {E}\), for all \( i \in [k] \).

  • \( \{v_i,v_{i+1}\} \) and \( \{v_{i+1},v_{i+2}\} \) do not share the same color, for all \( i \in [k-1] \).

In addition, we say that \( v^{k+1} \) is a red (res. blue) trail if \( \{v_1,v_2\} \) is colored red (res. blue).

Example 3.2

An alternating cycle is in fact a special type of alternating red trail with even length. In Fig. 3, (1, 2, 3, 4, 1), (5, 6, 7, 8, 9, 10), and (11, 12, 11) are some of the possible alternating trails. Note that all these trails are red trails. On the other hand, (2, 3, 4, 1, 2) is a blue trail.

Associated System of Equations: By definition, each edge in the dependency graph \( \mathcal {G}\) corresponds to an equation. For example, say we have an edge \( \{u,v\} \) with red color, then the associated equation is \( X_{u} = X_{v} \). By extension, each connected component corresponds to a system of equations. In particular, any alternating trail (or cycle) \( v^{k+1} \) can be uniquely associated with a system of k equations. For example, suppose \( v^{k+1} \) is an alternating red trail of even length. Then, the associated system of equation is \( X_{v_1} = X_{v_2},\ldots ,~Y_{v_{k}} = Y_{v_{k+1}} \).

Example 3.3

In Fig. 3, we can have the following associated system of equations:

  • For alternating cycle (1, 2, 3, 4, 1): \( X_1=X_2,~Y_2=Y_3,~X_3=X_4,~Y_4=Y_1 \).

  • For alternating trail (5, 6, 7, 8, 9, 10): \( X_5=X_6,~Y_6=Y_7,~X_7=X_8,~Y_8=Y_9,~X_9=X_{10} \).

  • For parallel edge (11, 12, 11): \( X_{11}=X_{12},~Y_{12}=Y_{11} \).

Definition 3.3

(First dependency [24]). An alternating trail of length \( k \ge 2 \) is said to have first dependency if all the equations in the associated system of equations, except the last one are independent of others, and the last equation is a consequence of the previous equations.

An alternating cycle of length \( k \ge 2 \) is said to have first dependency if all the equations in the associated system of equations, except one are independent of others, and exactly one is a consequence of the other equations.

Example 3.4

In Fig. 3, suppose \( L_5 = L_9 \), \( L_6 = L_{10} \), \( R_5=R_6 \), \( R_9=R_{10} \). Then, \( X_5=X_6 \) holds if \( f_1(L_5) = f_1(L_6) \) (as \( R_5=R_6 \)). Similarly, \( X_9=X_{10} \) holds if \( f_1(L_9) = f_1(L_{10}) \) (as \( R_9 = R_{10} \)). But, \( L_9 = L_5 \) and \( L_{10} = L_6 \). Thus, \( X_9 = X_{10} \) is a consequence of \( X_5 = X_6 \). Hence, \( X_5=X_6, ~Y_6=Y_7, ~X_7=X_8, ~Y_8=Y_9, ~X_9=X_{10} \) is an alternating trail of length 5 with first dependency.

Any alternating cycle of length k must have one of the following:

  1. 1.

    All the equations in the associated system of equations are independent.

  2. 2.

    The cycle has first dependency, i.e., all equations are independent except one.

  3. 3.

    The cycle contains an alternating trail of length \( < k \) which has first dependency.

The first case is easy to bound as we have to choose k indices and we have k independent equations, which gives \( O(q^k/2^{nk}) \) bound. The second case is similar to the last one, which is more general. Patarin argued that whenever an alternating trail has first dependency, then among the \( k+1 \) indices at least two are fixed once the other \( k-1 \) indices are chosen. Indices 6 and 9, for instance, are fixed once we choose indices 5 and 10 in Example 3.4. This observation immediately gives a bound of the form \( O(q^{k-1}/2^{n(k-1)}) \), since the first \( k-1 \) equations are independent. On combining the three cases, Patarin obtained the following bound on \( \mathrm {ACP}(q) \) in case of Benes.

$$\begin{aligned} \mathrm {ACP}(q) \le \frac{8590q^2}{2^{2n}} \end{aligned}$$
(4)

Notice the large constant in the bound, which compels large n to get appreciable security in practice. The main component of this constant is an infinite sum \( \sum _{k=3}^{\infty }\left( \frac{k^5}{2^{k-3}}\right) \). For large k, we observed that this sum can be approximated to 8588. In the same paper, Patarin also gave another improved bound [24, Theorem 9] using a more involved analysis which can be approximated to \( 26q^2/2^{2n}+200076q^3/2^{4n} \) for large k.

First Dependency in mBenes: While the first dependency idea is quite useful for deriving tight security bound of Benes, Patarin noted that the same is not true in case of mBenes. In fact, a crucial argument—among the k + 1 indices 2 indices are fixed once we fix k−1 indices—fails in case of mBenes. For example, suppose \( X_1=X_2,~Y_2=Y_3,~X_3=X_4 \) is an alternating trail with first dependency, such that \( L_1=L_3 \), \( L_2=L_4 \), and . It is clear to see that here only one index is fixed given the other three (\( L_4=L_2 \) and ). Consequently, Patarin speculates:

Therefore, a proof of security in \( O(2^n) \)for the Modified Benes will be different, and probably more complex than our proof of security on \( O(2^n) \) for the regular Benes.

4 HtmB: Hash Then Modified Benes

Section 3 gives a clear indication that the exact security of mBenes is a difficult problem. The main difficulty in the analysis is a simple fact that the distinguisher has complete control over the inputs to mBenes. However, in practice PRFs are mostly required to work over arbitrary domains, which requires an additional preprocessing phase before the application of fixed input length PRF. This preprocessing is often done via a universal hash function—the so-called Hash-then-PRF paradigm [21]. This added layer of preprocessing somewhat curtails the distinguisher’s ability to control the inputs to mBenes. Indeed, now we show that the composition of a universal hash function with mBenes leads to optimal security, with domain extension as byproduct.

Fig. 4.
figure 4

The three instantiations of Hash-then-modified-Benes or HtmB transformation. \( H = (H_1,H_2) \) is a \( \text {DbACU}_{q} \) hash function. From left to right: \( \textsf {HtmB}\text {-}\textsf {f}[H,\mathsf {\Gamma }_1,\ldots ,\mathsf {\Gamma }_4] = \textsf {HtmB} [H,\mathsf {\Gamma }_1,\ldots ,\mathsf {\Gamma }_4] \) based on ; \( \textsf {HtmB\text {-}p1} [H,\mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_6] = \textsf {HtmB} [H,\mathsf {\Pi }_1,\mathsf {\Pi }_2,\mathsf {F},\mathsf {G}] \) based on , where and ; and \( \textsf {HtmB\text {-}p2} [H,\mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_4] = \textsf {HtmB} [H,\mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_4] \) based on . An edge (uv) with label g denotes the mapping \( v = g(u) \). Unlabelled edges are identity mapping.

Hash-Then-Modified-Benes: Let \( \mathcal {M}\subseteq \{0,1\}^* \). Given a pair \( H = (H_1,H_2) \) of two \( (\mathcal {K},\mathcal {M},\{0,1\}^n) \)-keyed hash functions (\( H_1 \) and \( H_2 \) may share the same key), and \( f_1,\ldots ,f_4 \in \mathcal {F}(n,n) \), the Hash-then-modified-Benes or HtmB transformation is a function from \( \mathcal {M}\) to \( \{0,1\}^{n} \), which is defined as \( \textsf {HtmB} [H,f_1,\ldots ,f_4](M) := S \), where

(5)

Remark 4.1

Note that, we reduced the output length of HtmB from 2n bits to n bits. This is mainly due to the fact that n bits of the output of a VIL PRF is sufficient to achieve \( 2^n \) query deterministic MAC security (a major inspiration for this work). In any case, another n-bit block can be easily generated by setting for some \( f_5,f_6 \in \mathcal {F}(n,n) \).

We extend the dependency graph of Sect. 3.1 to incorporate the hash function H. To any input \( M^q \in (\mathcal {M})_q \), \( K \in \mathcal {K}\), and \( f_1,f_2 \in \mathcal {F}(n,n) \), we associate the dependency graph \( \mathcal {G}[M^q;K,f_{1,2}]=([q],\mathcal {E}) \), where \( \mathcal {E}\) is defined as before. Thus, \( \mathcal {G}\) is again a bichromatic graph. We define , alternating trails, cycles, and the first dependency property analogously as in Sect. 3.1.

In the following subsections we present three security results on HtmB based on the choice of \( f_1,\ldots ,f_4 \).

4.1 HtmB-f: Random Function Based Construction

Given , we obtain the hash-then-PRF instance where the PRF is instantiated with \( \textsf {mBenes} [\mathsf {\Gamma }_1,\ldots ,\mathsf {\Gamma }_4] \) (truncated to first n-bit). Formally, we define \( \textsf {HtmB}\text {-}\textsf {f}[H,\mathsf {\Gamma }_1,\ldots ,\mathsf {\Gamma }_4] \) (see Fig. 4) as \( \textsf {HtmB} [H,\mathsf {\Gamma }_1,\ldots ,\mathsf {\Gamma }_4] \).

Recall that \( \mathrm {ACP}(q) \) denotes the maximum probability of getting an alternating cycle in the dependency graph \( \mathcal {G}\), where the probability is maximized over all choices of message tuple \( M^q \). Lemma 4.1 gives a bound on \( \mathrm {ACP}(q) \).

Lemma 4.1

For \( \epsilon _1,\epsilon _2,\sigma \ge 0 \), \( q \le 2^{n-1} \), \( (q,\sigma ,\epsilon _2,\epsilon _1) \)-\( \text {DbACU}_{q} \) hash function \( H_{K} \) instantiated with , and , we have

$$\begin{aligned} \mathrm {ACP}(q) \le \frac{4q^2}{2^{2n}} + \frac{2q^2}{2^{3n}} + \epsilon _2 + 2\epsilon _1. \end{aligned}$$

Proof

Fix a q-tuple \( M^q \in (\mathcal {M})_q \) that maximizes \( \mathrm {ACP}(q) \). Recall that \( (L^q,R^q) = H_K(M^q) \), and . We bound the probability of conditioned on the following events:

  • .

  • .

  • .

Let .

First, consider the probability of getting an alternating cycle of length 2 (parallel edge). Suppose the alternating cycle is \( X_{i_1} = X_{i_2},~Y_{i_1} = Y_{i_2} \), which can be rewritten as

Suppose \( L_{i_1} = L_{i_2} \). Then, since holds, \( R_{i_1} \ne R_{i_2} \), whence the first equation is not satisfied. Therefore, \( L_{i_1} \ne L_{i_2} \). A similar argument implies \( R_{i_1} \ne R_{i_2} \). Then, the system of equations must have full rank, i.e. rank 2. Using the randomness of \( \mathsf {\Gamma }_{1} \) and \( \mathsf {\Gamma }_{2} \), we get \( q^2/2^{2n} \).

For even \( k > 2 \), let \( X_{i_1} = X_{i_2},~Y_{i_2}=Y_{i_3},~\cdots ,~Y_{i_k} = Y_{i_1} \) be an alternating cycle of length k. Then, we can rewrite it as

Now, we must have one of the following three cases:

  1. 1.

    Independent cycle: All k equations are independent, i.e., rank is k. Then, we can bound the probability to \( q^k/2^{k n} \).

  2. 2.

    Strict sub-trail with first dependency: The cycle contains an alternating sub-trail of length \( k' < k \), which has first dependency. Therefore, all the equations are independent except the last equation which is a consequence of previous equations. Without loss of generality, we assume that \( k' \) is odd. Then, we must have an associated system of equations

    Since the last equation is a consequence of previous equations, we must have some \( i_j,i_{j'} < i_{k'} \), such that \( L_{i_{k'}} = L_{i_{j}} \) and \( L_{i_{k'+1}} = L_{i_{j'}} \). Using the fact that holds, we can have at most q choices for \( (i_{k'},i_{j}) \) and at most q choices for \( (i_{k'+1},i_{j'}) \). Similarly, we can use when \( k' \) is even. The remaining \( k'-3 \) indices can be chosen in at most \( q^{k'-3} \) ways. Finally, we bound the probability to at most \( q^{k'-1}/2^{(k'-1)n} \) (as exactly \( k'-1 \) equations are independent).

  3. 3.

    Circle has first dependency: All the equations are independent except for the last one. This case can be handled in a similar manner as case 2. In fact, we get \( q^{k-2}/2^{(k-1)n} \) which is a better bound as compared to case 2.

Combining the three cases we have

(6)

where the last inequality follows from \( q \le 2^{n-1} \). Finally, we have

At the last inequality, the third term on the right hand side follows from the \( (q,\sigma ,\epsilon _2) \)-AU property of H, and the fourth term follows from the \( (q,\sigma ,\epsilon _1) \)-\( \text {ACU}_{q} \) property of \( H_1 \) and \( H_2 \).   \(\square \)

Remark 4.2

The utility of universal hash layer lies in the analysis of case 2 (and 3) in the proof. Specifically, we use the \( (q,\sigma ,\epsilon _1) \)-\( \text {ACU}_{q} \) property of \( H_1 \) and \( H_2 \) to reduce the count of pairs with same L (or R) value from \( q^2 \) to q, which in turn helps us in reducing the overall choices for the \( k'+1 \) indices to \( k'-1 \).

Remark 4.3

This pair idea is not applicable to mBenes as the distinguisher has full control over the inputs \( (L_i,R_i) \). For instance, the distinguisher can fix a single L value across all q queries, so that we have exactly \( q(q-1) \) pairs.

By now, it should be clear that Lemma 4.1 resolves the main hurdle in a proof of security up to \( O(2^n) \) queries for HtmB-f. Theorem 4.1 quantifies the PRF security of HtmB-f.

Theorem 4.1

For \( \epsilon _1,\epsilon _2,\sigma \ge 0 \), \( q \le 2^{n-1} \), , and \( (q,\sigma ,\epsilon _2,\epsilon _1) \)-\( \text {DbACU}_{q} \) hash function \( H_K \) instantiated with , the PRF advantage of any \( (q,\infty ) \)-distinguisher \( \mathscr {A}\) against \( {\textsf {HtmB}\text {-}\textsf {f}}[H,\mathsf {\Gamma }_1,\ldots ,\mathsf {\Gamma }_4] \) is given by

$$ \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {HtmB}\text {-}\textsf {f}}[H,\mathsf {\Gamma }_1,\ldots ,\mathsf {\Gamma }_4]}(\mathscr {A}) \le \frac{4q^2}{2^{2n}} + \frac{2q^2}{2^{3n}} + \epsilon _2 + 2\epsilon _1. $$

Proof

A proof of this theorem can be derived using similar arguments as in case of Theorem 3.1 after substituting the bound of \( \mathrm {ACP}(q) \) from Lemma 4.1.

4.2 HtmB-p1: Random Permutation Based Construction

In this subsection, we aim to give a random permutation based instantiation of HtmB, called HtmB-p1. The obvious inspiration behind this is the wide availability of block ciphers which can be used to instantiate HtmB-p1.

A trivial way to achieve this is to replace the random functions with sum of independent random permutations. But this will cost 8 random permutation calls (2 calls for each \( f_i \), \( i \in [4] \)). Instead, we observe that \( f_1 \) and \( f_2 \) can each be instantiated with single random permutation without any appreciable drop in security. This reduces the number of random permutation calls to 6.

Given , we define the mappings, \( \mathsf {F},\mathsf {G} \in \mathcal {F}(n,n) \) as

and \( \textsf {HtmB\text {-}p1} [H,\mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_6] \) (see Fig. 4) is defined as \( \textsf {HtmB} [H,\mathsf {\Pi }_1,\mathsf {\Pi }_2,\mathsf {F},\mathsf {G}] \). Theorem 4.2 gives the PRF security of HtmB-p1.

Theorem 4.2

For \( n \ge 4 \), \( \epsilon _1,\epsilon _2,\sigma \ge 0 \), \( q \le 2^{n-4} \), , and \( (q,\sigma ,\epsilon _2,\epsilon _1) \)-\( \text {DbACU}_{q} \) hash function \( H_K \) instantiated with key , the PRF advantage of any \( (q,\infty ) \)-distinguisher \( \mathscr {A}\) against \( {\textsf {HtmB\text {-}p1}}[H,\mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_6] \) is given by

$$ \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {HtmB\text {-}p1}}[H,\mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_6]}(\mathscr {A}) \le \frac{2q^{1.5}}{2^{1.5n}} + \frac{16q^2}{2^{2n}} + \frac{16q^2}{2^{3n}} + \epsilon _2 + 2\epsilon _1. $$

Proof

Using hybrid argument, we replace \( \mathsf {F} \) and \( \mathsf {G} \) functions in the lower layer with independent random functions . This incurs a cost of \( 2q^{1.5}/2^{1.5n} \) (using Proposition 2.1). We denote the resulting construction by \( \textsf {HtmB} ^\star \). Then we must have a \( (q,\infty ) \)-distinguisher \( \mathscr {B}\) against \( \textsf {HtmB} ^\star \), such that

$$\begin{aligned} \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {HtmB\text {-}p1}}[H,\mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_6]}(\mathscr {A}) \le \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {HtmB}}^\star }(\mathscr {B}) + \frac{2q^{1.5}}{2^{1.5n}}. \end{aligned}$$
(7)

Now, using a similar line of argument as used in Theorem 3.1, one can show that

$$\begin{aligned} \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {HtmB}}^\star }(\mathscr {B}) \le \mathrm {ACP}(q). \end{aligned}$$
(8)

Lemma 4.2 bounds \( \mathrm {ACP}(q) \) to \( \frac{16q^2}{2^{2n}} + \frac{16q^2}{2^{3n}} + \epsilon _2 + 2\epsilon _1 \), which in combination with Eq. (7) and (8) gives the result.   \(\square \)

Lemma 4.2

For \( q \le 2^{n-2} \), , and , we have

$$\begin{aligned} \mathrm {ACP}(q) \le \frac{16q^2}{2^{2n}} + \frac{16q^2}{2^{3n}} + \epsilon _2 + 2\epsilon _1. \end{aligned}$$

Proof

The proof idea is similar to the proof of Lemma 4.1 given in the previous subsection. So, we reuse the same set of notations and definitions.

Fix a q-tuple \( M^q \in (\mathcal {M})_q \) that maximizes \( \mathrm {ACP}(q) \). We bound the probability of conditioned on the following events:

  • .

  • .

  • .

The proof follows in exactly the same manner, except a minor change in the probability bound, due to a distributional change in the underlying randomness (random function to random permutation). It is easy to see that a system of k independent equations holds with probability less than \( 1/(2^n-k)^k \), when \( \mathsf {\Pi }_1 \) and \( \mathsf {\Pi }_2 \) are random permutations. We further simplify it to \( 2^k/2^{kn} \) using \( k< q < 2^{n-1} \).

Using the above mentioned probability bound, along with the argumentation used in the proof of Lemma 4.1, we get

(9)

where the last inequality follows from \( q \le 2^{n-2} \). Finally, we have

$$\begin{aligned} \mathrm {ACP}(q)&\le \frac{16q^2}{2^{2n}} + \frac{16q^2}{2^{3n}} + \epsilon _2 + 2\epsilon _1. \end{aligned}$$

   \(\square \)

4.3 HtmB-p2: An Improvement over HtmB-p1

One can further reduce the number of permutation calls in HtmB-p1, if the generalized version of Mirror Theory [9, 33, 34] is correct. Specifically, we simply replace \( \mathsf {F} \) and \( \mathsf {G} \) in the definition of HtmB-p1 with the permutations \( \mathsf {\Pi }_3 \) and \( \mathsf {\Pi }_4 \) to get HtmB-p2. Formally, given \( \mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_4 \) we define \( \textsf {HtmB\text {-}p2} [H,\mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_4] \) (see Fig. 4) as \( \textsf {HtmB} [H,\mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_4] \).

For any \( M^q \in (\mathcal {M})_q \), \( K \in \mathcal {K}\), and , \( X^q \), \( Y^q \), and \( S^q \) are well-defined. In addition to , we define two more properties on \( \mathcal {G}[M^q;K,\pi _{1,2}] \):

  • : The largest component in \( \mathcal {G}[M^q;K,\pi _{1,2}] \) contains at least \( \xi +1 \) vertices.

  • : \( \mathcal {G}[M^q;K,\pi _{1,2}] \) contains an alternating trail \( v^{k+1} \), k odd, such that \( \bigoplus _{j = 1}^{k+1} S_{v_j} = 0 \).

Patarin’s Mirror Theory: Mirror theory [9, 33, 34] is a tool to obtain lower bound on the number of solutions of a system of equalities and non-equalities in finite groups. We restrict ourselves to the binary field \( \mathrm {GF}(2^{n})\) with as the group operation. We use Mennink and Neves interpretation [12, 19, 35] of mirror theory, tailored to our needs and notational setup.

From \( X^q \) and \( Y^q \), we define the mappings \( \phi ,\psi \in \mathcal {F}([q],[q]) \) as \( \phi (i) = \min \{j:X_j=X_i\} \) and \( \psi (i) = \min \{k:Y_k=Y_i\} \). Let \( \phi ([q]) \) and \( \psi ([q]) \) denote the range of \( \phi \) and \( \psi \), respectively. Consider the set of equations , where \( U_j \) and \( V_k \) denote the unknowns for all \( j \in \phi ([q]) \) and \( k \in \psi ([q]) \). We define three properties on \( \mathcal {L}\):

  • Circle-free: \( \mathcal {L}\) is called circle-free if is false.

  • Non-degenerate: \( \mathcal {L}\) is called non-degenerate if is false.

  • \(\xi \)-block-maximal: \( \mathcal {L}\) is called \( \xi \)-block-maximal if is false.

Whenever \( \mathcal {L}\) is circle-free, non-degenerate, and \( \xi \)-block-maximal, then we say that \( \mathcal {L}\) is mirror theory compatible till \( \xi \). The fundamental result of mirror theory [9, Theorem 6] is given in Theorem 4.3.

Theorem 4.3

(Theorem 3 in[12]). Suppose \( \mathcal {L}\), as defined above, is mirror theory compatible till \( \xi \). Then, as long as \( \xi ^2\cdot \max \{|\phi ([q])|,|\psi ([q])|\} \le 2^n/67 \), the number of solutions for \( \mathcal {L}\), such that \( U_i \ne U_j \) for distinct \( i,j \in \phi ([q]) \) and \( V_k \ne V_\ell \) for distinct \( k,\ell \in \psi ([q]) \), is at least

$$\begin{aligned} \frac{(2^n)_{|\phi ([q])|}(2^n)_{|\psi ([q])|}}{2^{nq}}. \end{aligned}$$

In [9], Patarin gave a very high level sketch of the proof. Later, in [34] Nachef, Patarin and Volte gave a proof that works till \( q < 2^{n-3} \). In [12], Mennink and Neves gave a detailed exposition on mirror theory, and utilized the theory to get close to n-bit security bounds for EDM (and EWCDM [11], in nonce-respectingFootnote 5 setting). Jha and Nandi [19] developed a variant of mirror theory to derive tight security bounds for CLRW2 [36] and DbHtS. Independently, Kim et al. [20] used the theory to derive tight security bounds for several DbHtS MACs, including PMAC+ and LightMAC+. We use Theorem 4.3 in the security proof of HtmB-p2.

Theorem 4.4

For \( \epsilon _1,\epsilon _2,\sigma \ge 0 \), \( q \le \min \{2^{n-2},2^n/67n^2\} \), , and \( (q,\sigma ,\epsilon _2,\epsilon _1) \)-\( \text {DbACU}_{q} \) hash function H instantiated with key , the PRF advantage of any \( (q,\infty ) \)-distinguisher \( \mathscr {A}\) against \( {\textsf {HtmB\text {-}p2}}[H,\mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_4] \) is given by

Proof approach: The idea is quite similar to the proof of HtmB-f. However, just avoiding in \( \mathcal {G}\) is not enough. This is due to the switch from random functions to random permutations. For example, the system of equations should be non-degenerate. Otherwise, we might get a case where \( \mathsf {\Pi }_3(X_i) = \mathsf {\Pi }_3(X_j) \) for \( X_i \ne X_j \), which is clearly not possible. We show that the system is mirror theory compatible till n, except with very negligible probability as long as \( q \le 2^{n-2} \). Then, we apply the fundamental result of mirror theory to get the proof of security using coefficient-H technique.

Proof

\( \mathscr {A}\) tries to distinguish the real oracle \( \mathcal {R}:=(\textsf {HtmB\text {-}p2} [H,\mathsf {\Pi }_1,\ldots ,\mathsf {\Pi }_4]) \) from the ideal oracle \( \mathcal {I}:=(\mathsf {\Gamma }') \) for . Let [q] denote the set of all query indices, and \( (M^q,S^q) \) denote \( \mathscr {A}\)’s transcript, where \( M^q \) is the q-tuple of inputs and \( S^q \) is the q-tuple of outputs.

Consider a variant distinguishing game, where the oracle releases \( L^q \), \( R^q \), \( X^q \), and \( Y^q \), once the distinguisher has made all q queries. Note that this can only increase \( \mathscr {A}\)’s advantage, and not diminish it. In \( \mathcal {R}\), this is quite straightforward, as \( L^q \), \( R^q \), \( X^q \), and \( Y^q \), are already computed during the query phase. The ideal oracle \( \mathcal {I}\), samples dummy and , and sets \( (L^q,R^q) = H_K(M^q) \), and .

Bad Transcript: Let \( \mathcal {T}\) denote the set of all transcripts. Let denote the event that the system of equations is not mirror theory compatible till n, and good otherwise. So holds if at least one of , , or is satisfied. We say that a transcript \( (M^q,L^q,R^q,X^q,Y^q,S^q) \) is bad if happens, and good otherwise. Let denote the set of all bad transcripts. Then, we have .

We bound the probability of conditioned on the following events:

  • .

  • .

  • .

Let . Then, we have

(10)

where inequality \( (*) \) follows from the definition of , and inequality \( (**) \) follows from Lemma 4.2. Lemma 4.3 bounds the probability of and Lemma 4.4 bounds the probability of conditioned on .

Good Transcript: Fix a good transcript \( (M^q,L^q,R^q,X^q,Y^q,S^q) \). Since the ideal oracle faithfully (identical to the real oracle) simulates the computation of \( L^q \), \( R^q \), \( X^q \), and \( Y^q \), it is sufficient to concentrate on the ratio of the probabilities that \( (X^q,Y^q) \) maps to \( S^q \) in the real oracle and \( M^q \) maps to \( S^q \) in the ideal oracle.

(11)

where \( h_q \) denotes the number of solutions of the system of equations , such that \( \mathsf {\Pi }_3(X_i) \ne \mathsf {\Pi }_3(X_j) \) and \( \mathsf {\Pi }_4(Y_k) \ne \mathsf {\Pi }_4(Y_\ell ) \) for all \( X_i \ne X_j \) and \( Y_k \ne Y_\ell \). Further, each solution holds with exactly \( 1/(2^n)_{|\phi ([q])|}(2^n)_{|\psi ([q])|} \) probability, since \( \mathsf {\Pi }_3 \) and \( \mathsf {\Pi }_4 \) are invoked on exactly \( |\phi ([q])| \) and \( |\psi ([q])| \), respectively, distinct points. This justifies equality \( (*) \). Let \( U_{\phi (i)} = \mathsf {\Pi }_3(X_i) \) and \( V_{\psi (i)} = \mathsf {\Pi }_4(Y_i) \) for all \( i \in [q] \). Since the transcript is good, is mirror theory compatible till n. Hence, using Theorem 4.3, we have

$$\begin{aligned} h_q \ge \frac{(2^n)_{|\phi ([q])|}(2^n)_{|\psi ([q])|}}{2^{nq}}. \end{aligned}$$
(12)

This justifies the inequality \( (**) \). The result follows from Eq. (10), Lemmata 4.3 and 4.4, and Theorem 2.1.   \(\square \)

Remark 4.4

In Eq. (11) we have substituted \( h_q \) with the lower bound claimed in the fundamental result of mirror theory (see Theorem 4.3). However, as reported in multiple works [10, 19, 35, 37], a concrete proof of this result is still not available. Here, we discuss the impact of a weaker mirror theory result on Theorem 4.4. Suppose, in future we get a mirror theory proof that holds for some \( \xi < n \) and the lower bound is

$$\begin{aligned} (1-\delta ) \times \frac{(2^n)_{|\phi ([q])|}(2^n)_{|\psi ([q])|}}{2^{nq}}, \end{aligned}$$

for some \( \delta > 0 \). Here \( \delta \) can be viewed as the degree of deviation from the perfect bound. Then, the bound in Theorem 4.4 is revised asymptotically to

where the red colored terms are due to the degradation in mirror theory bound. Specifically, \( O(q^{\xi +1}/2^{n\xi }) \) arises in the bound of , and \( \delta \) appears on the right hand side of Eq. (11) by substituting the weaker bound for \( h_q \).

Lemma 4.3

For \( q \le 2^{n-2} \), , and , we have

Lemma 4.4

For \( q \le 2^{n-2} \), , , and , we have

Given the similarity of the proofs of Lemmata 4.3 and 4.4 with the proof of Lemma 4.2, they are deferred to the full version of this paper.

5 mLightMAC+ and mPMAC+

In this section, we define two families mLightMAC+ and mPMAC+ of deterministic MAC candidates based on block ciphers. Both families are constructed as the HtmB construction, where the \( \text {DbACU}_{q} \) hash functions (see Sect. 2) are instantiated with the LightHash and PHash hash functions. In particular, our schemes have the following properties:

  • they are secure VIL PRFs as long as the number of queried blocks are small in front of \(2^n\), where n denotes the block size;

  • the calls to the underlying permutation can be computed in parallel.

5.1 mLightMAC+

In this section, we define the mLightMAC+ construction and prove its security. We are going to proceed in two steps: first, we define the LightHash family of permutation-based hash functions and upper bound the probability to get colliding outputs in Lemma 5.1, and then we use Theorems 4.14.4 to prove the actual security bound on mLightMAC+ in Corollary 5.1.

The LightHash Universal Hash Function: Given a permutation and a positive integer \(m\in [n-2]\), the LightHash universal hash function is a function from \(\{0,1\}^{\le (n-m)2^m-1}\) to \(\{0,1\}^{2n}\) defined as follows. For all messages \(M\in \{0,1\}^{\le (n-m)2^m-1}\), we let , \(l=|M'|/(n-m)\) and \(M'= M^1||\cdots ||M^l\), where \(|M^i|=n-m\) for all \(i\in [l]\). The hash of the message M is defined as \(\textsf {LightHash} [\pi ,m](M) = (\textsf {LightHash} _1[\pi ,m](M),\textsf {LightHash} _2[\pi ,m](M))\), where

$$\begin{aligned} \textsf {LightHash} _1[\pi ,m](M)&=(\langle l \rangle _m||M^l)\oplus \bigoplus _{i=1}^{l-1} \pi \left( \langle i \rangle _m||M^i\right) ,\\ \textsf {LightHash} _2[\pi ,m](M)&=(\langle l \rangle _m||M^l)\oplus \bigoplus _{i=1}^{l-1} 2^{l-i} \pi \left( <i>_m||M^i\right) . \end{aligned}$$

Note that LightHash requires 1 less block cipher call as compared to the hash layer in LightMAC+. The probability that two distinct messages generate colliding outputs in both components of LightHash can be upper bounded as follows.

Lemma 5.1

Let \(n\in \mathbb {N}\), \(m\in [n-2]\). For any two distinct messages \( M_1, M_2 \) in \(\{0,1\}^{\le (n-m)2^m-1}\) and , one has

$$\begin{aligned} {\Pr _{ }\left[ {\textsf {LightHash}}[\mathsf {\Pi },m](M_1)={\textsf {LightHash}}[\mathsf {\Pi },m](M_2)\right] }&\le \frac{4}{2^{2n}},\\ {\Pr _{ }\left[ {\textsf {LightHash}}_b[\mathsf {\Pi },m](M_1)={\textsf {LightHash}}_b[\mathsf {\Pi },m](M_2)\right] }&\le \frac{2}{2^{n}}, \end{aligned}$$

for \( b \in \{0,1\}\). In particular LightHash is \((q,\infty ,\frac{2q^2}{2^{2n}},\frac{q}{2^n})\)-\( \text {DbACU}_{q} \).

The proof of this Lemma can be found in Sect. 6.1.

The mLightMAC+ Family of PRFs: Given , \( f_1,\ldots ,f_4 \in \mathcal {F}(n,n) \) and an integer \(m \in [n-2]\), the functions of the mLightMAC+ family are functions from \(\{0,1\}^{\le (n-m)2^m-1}\) to \(\{0,1\}^n\) that are formally defined as

$$\begin{aligned} \textsf {mLightMAC+\text {-}f} [\pi _0,f_1\ldots ,f_4,m]&:= \textsf {HtmB}\text {-}\textsf {f}\left[ \textsf {LightHash} [\pi _0,m],f_1,\ldots ,f_4\right] ,\\ \textsf {mLightMAC+\text {-}p1} [\pi _0,\pi _1\ldots ,\pi _6,m]&:= \textsf {HtmB\text {-}p1} \left[ \textsf {LightHash} [\pi _0,m],\pi _1,\ldots ,\pi _6\right] ,\\ \textsf {mLightMAC+\text {-}p2} [\pi _0,\pi _1\ldots ,\pi _4,m]&:= \textsf {HtmB\text {-}p2} \left[ \textsf {LightHash} [\pi _0,m],\pi _1,\ldots ,\pi _4\right] . \end{aligned}$$

Corollary 5.1 gives the PRF security of mLightMAC+.

Corollary 5.1

For \(q < 2^{n-4}\), \(m \le n-2\), and , , the PRF advantage of any \((q,\infty )\)-distinguisher \(\mathscr {A}\) against mLightMAC+ is given by

$$\begin{aligned} \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {mLightMAC+\text {-}f}}[\mathsf {\Pi }_0,\mathsf {\Gamma }_1,\ldots ,\mathsf {\Gamma }_4,m]}(\mathscr {A})&\le \frac{6q^2}{2^{2n}} + \frac{2q^2}{2^{3n}} +\frac{2q}{2^n},\\ \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {mLightMAC+\text {-}p1}}[\mathsf {\Pi }_0,\ldots ,\mathsf {\Pi }_6,m]}(\mathscr {A})&\le \frac{2q^{1.5}}{2^{1.5n}} + \frac{18q^2}{2^{2n}} + \frac{16q^2}{2^{3n}} +\frac{2q}{2^n},\\ \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {mLightMAC+\text {-}p2}}[\mathsf {\Pi }_0,\ldots ,\mathsf {\Pi }_4,m]}(\mathscr {A})&\le \frac{16q^{2}}{2^{3n}} + \frac{38q^2}{2^{2n}} +\frac{6q}{2^n}. \end{aligned}$$

For the second and third inequalities, we also assume \( n \ge 4 \) and \(q \le 2^n/67n^2\), respectively.

Proof

This result is a direct combination of Lemma 5.1 and Theorems 4.1, 4.2 and 4.4.   \(\square \)

5.2 mPMAC+

As in the previous section, we define the mPMAC+ construction and prove its security. We first define the PHash family of permutation-based hash functions and upper bound the probability to get colliding outputs in Lemma 5.2, and then we use Theorems 4.14.4 to prove the actual security bound on mPMAC+ in Corollary 5.2.

The PHash Universal Hash Function: Given a permutation , the PHash universal hash function is a function from \(\{0,1\}^{\le n2^{n/2}-1}\) to \(\{0,1\}^{2n}\) defined as follows. For all messages \(M\in \{0,1\}^{\le n2^{n/2}-1}\), we let , \(l=|M'|/n\) and \(M'=M^1||\cdots ||M^l\), where \(|M^i|=n\) for all \(i\in [l]\). The hash of the message M is then defined as \(\textsf {PHash} [\pi ](M) = (\textsf {PHash} _1[\pi ](M),\textsf {PHash} _2[\pi ](M))\), where

$$\begin{aligned} \textsf {PHash} _1[\pi ](M)&=M^{l}\oplus \bigoplus _{i=1}^{l-1} \pi \left( M^i \oplus 2^{i} \pi (0^n) \oplus 2^{2i}\pi (10^{n-1})\right) ,\\ \textsf {PHash} _2[\pi ](M)&=M^{l}\oplus \bigoplus _{i=1}^l 2^{l-i} \pi \left( M^i \oplus 2^{i} \pi (0^n) \oplus 2^{2i}\pi (10^{n-1})\right) . \end{aligned}$$

Again note that PHash requires 1 less block cipher call as compared to the hash layer in PMAC+. One has the following result on the \( \text {DbACU}_{q} \) bound of PHash.

Lemma 5.2

Let \(n \ge 6\). For , \(\sigma \in \mathbb {N}\), \({\textsf {PHash}}[\mathsf {\Pi }]\) is \((q,\sigma ,\epsilon _2,\epsilon _1)\)-\( \text {DbACU}_{q} \) where

$$\begin{aligned} \epsilon _2\le \frac{2\sigma ^2+28q\sigma +28q^2}{2^{2n}}+\frac{3q}{2^n-2}+3\frac{\sigma +q}{2^n-1}\qquad \text {and}\qquad \epsilon _1\le \frac{4\sigma +9q}{2^n}. \end{aligned}$$

The proof of this Lemma can be found in Sect. 6.2.

The mPMAC+ Family of PRFs: Given and \( f_1,\ldots ,f_4 \in \mathcal {F}(n,n) \), the functions of the mPMAC+ family are functions from \(\{0,1\}^{n2^{n/2}-1}\) to \(\{0,1\}^n\) that are formally defined as

$$\begin{aligned} \textsf {mPMAC+\text {-}f} [\pi _0,f_1,\ldots ,f_4]&:=\textsf {HtmB}\text {-}\textsf {f}\left[ \textsf {PHash} [\pi _0],f_1,\ldots ,f_4]\right] ,\\ \textsf {mPMAC+\text {-}p1} [\pi _0,\pi _1,\ldots ,\pi _6]&:=\textsf {HtmB\text {-}p1} \left[ \textsf {PHash} [\pi _0],\pi _1,\ldots ,\pi _6]\right] ,\\ \textsf {mPMAC+\text {-}p2}[\pi _0,\pi _1,\ldots ,\pi _4]&:=\textsf {HtmB\text {-}p2} \left[ \textsf {PHash} [\pi _0],\pi _1,\ldots ,\pi _4]\right] \end{aligned}$$

Corollary 5.2 gives the PRF security of mPMAC+.

Corollary 5.2

Let \(n \ge 6\). For \(q<2^{n-4}\) and , and , the PRF advantage of any \((q,\infty )\)-distinguisher \( \mathscr {A}\) against mPMAC+ is given by

$$\begin{aligned} \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {mPMAC+\text {-}f}}[\mathsf {\Pi }_0,\mathsf {\Gamma }_1,\ldots ,\mathsf {\Gamma }_4]}(\mathscr {A})&\le \frac{2q^2}{2^{3n}} + \frac{2\sigma ^2+28q\sigma +32q^2}{2^{2n}}+\frac{11\sigma +15q}{2^n-2},\\ \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {mPMAC+\text {-}p1}}[\mathsf {\Pi }_0,\ldots ,\mathsf {\Pi }_6]}(\mathscr {A})&\le \frac{2q^{1.5}}{2^{1.5n}} +\frac{16q^2}{2^{3n}} +\frac{2\sigma ^2+28q\sigma +44q^2}{2^{2n}}+\frac{11\sigma +15q}{2^n-2},\\ \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {mPMAC+\text {-}p2}}[\mathsf {\Pi }_0,\ldots ,\mathsf {\Pi }_4]}(\mathscr {A})&\le \frac{16q^2}{2^{3n}} +\frac{2\sigma ^2+28q\sigma +64q^2}{2^{2n}}+\frac{11\sigma +19q}{2^n-2},\\ \end{aligned}$$

where \(\sigma \) denotes an upper bound on the total number of n-bit blocks queried by \( \mathscr {A}\). For the last inequality, we also assume \(q \le 2^n/67n^2\).

Proof

This result is a direct combination of Lemma 5.2 and Theorem 4.1, 4.2 and 4.4.   \(\square \)

6 Proofs Related to LightHash and PHash

6.1 Proof of Lemma 5.1

Let \(q\in \mathbb {N}\), \(m\in [n-2]\), \(M^q\in \left( \{0,1\}^{(n-m)2^m-1}\right) _q\).

Let us now fix two distinct integers \(i_1,i_2\in [q]\), and let \(M_1=M_{i_1}\), \(M_2=M_{i_2}\).

The proof for the first inequality closely follows the proof of [17, Lemma 1] for the original \(\textsf {LightHash} \) construction, with slight changes to handle our variant. It is thus deferred to the full version of this paper for reasons of space.

We now consider the second inequality we have to prove, and denote by \(l_1\) (resp. \(l_2\)) the length of (resp. ) in \((n-m)\)-bit blocks. Note that \(1\le l_1,l_2\le 2^m\le 2^{n-2}\). Then the event

$$ \textsf {LightHash} _1[\mathsf {\Pi },m](M_1)=\textsf {LightHash} _1[\mathsf {\Pi },m](M_2) $$

is equivalent to:

$$\begin{aligned} \left( \langle l_1 \rangle _m||M_1^{l_1}\right) \oplus \bigoplus _{i=1}^{l_1-1}\mathsf {\Pi }\left( \langle i \rangle _m||M_1^i\right) =\left( \langle l_2 \rangle _m||M_2^{l_2}\right) \oplus \bigoplus _{i=1}^{l_2-1}\mathsf {\Pi }\left( \langle i \rangle _m||M_2^i\right) . \end{aligned}$$
(13)

We consider two different cases: \(l_1 \ne l_2\) and \(l_1 = l_2\). Consider the first case. Let us assume that \(1 \le l_1 < l_2\). Thus, thanks to domain separation of the inputs and since at most \(l_1 + l_2 \le 2^{n-1}\) outputs appear in Eq. (13), fixing all the other outputs will provide a unique solution for \(\mathsf {\Pi }(\langle l_2 \rangle _m|| M_2^{l_2})\). Hence, the probability that (13) is satisfied is at most \( 1 / (2^n - l_1 - l_2 + 3)\). Now consider the second case. Since the adversary cannot repeat queries and our padding is injective, and must differ in at least one block. Let \(i_0\ge 1\) be the first such index. Then, even when eliminating the colliding outputs from Eq. (13), at least the outputs with index \(i_0\) will remain. If \(i_0\le l_1-1\), fixing all the other outputs will provide a unique solution for \(\mathsf {\Pi }(\langle i_0 \rangle _m|| M_1^{i_0})\), and the probability that Eq. (13) is satisfied is also at most \( 1 / (2^n - l_1 - l_2 + 3)\). Otherwise, if \(i_0=l_1\), Eq. (13) is reduced to \(M_1^{l_1}=M_2^{l_2}\), which cannot hold by definition of \(i_0\).

Overall, since \(l_1 + l_2 \le 2^{n - 1}\), one has

$$ {\Pr _{ }\left[ \textsf {LightHash} _1[\mathsf {\Pi },m](M_1)=\textsf {LightHash} _1[\mathsf {\Pi },m](M_2)\right] } \le \frac{2}{2^{n}}. $$

Similarly, one has

$$ {\Pr _{ }\left[ \textsf {LightHash} _2[\mathsf {\Pi },m](M_1)=\textsf {LightHash} _2[\mathsf {\Pi },m](M_2)\right] } \le \frac{2}{2^{n}}. $$

We conclude the proof of the second part of Lemma 5.1 by summing over the \(q(q-1)/2\) pairs of queries and using Markov’s inequality.

6.2 Proof of Lemma 5.2

A flaw in [26]: The probability of observing a full collision in PHash has already been considered in [26]. However, Chakraborty et al. [27] identified a flaw in the argument. In more details, when considering what is referred to as Type-5 collisions, the author tries to upper bound the probability, over the random choice of two n-bit masks \(L_1\) and \(L_2\), that the following system is satisfied:

$$\begin{aligned} (2^{i_1}\oplus 2^{i_2})L_1 \oplus (2^{3i_1}\oplus 2^{3i_2})L_2&= X_1\\ (2^{i_3}\oplus 2^{i_4})L_1 \oplus (2^{3i_3}\oplus 2^{3i_4})L_2&= X_2 \end{aligned}$$

for some n-bit values \(X_1\), \(X_2\) and four integers \(i_1,i_2,i_3,i_4\) such that at least three of them are distinct. It is then argued that either the system is of rank two, and has exactly one solution, or both equations are equal. In the second case, the author shows that \(2^{i_1}\oplus 2^{i_2}=2^{i_3}\oplus 2^{i_4}\) and \(2^{3i_1}\oplus 2^{3i_2}=2^{3i_3}\oplus 2^{3i_4}\) imply \(i_1=i_2=i_3=i_4\) which is impossible. However, it seems that another case is possible: the second equation can be a multiple of the first one. In that case, there exists a non-zero value \(\alpha \) such that \(\alpha (2^{i_1}\oplus 2^{i_2})=2^{i_3}\oplus 2^{i_4}\), \(\alpha (2^{3i_1}\oplus 2^{3i_2})=2^{3i_3}\oplus 2^{3i_4}\) and \( \alpha X_1=X_2 \), and the previous impossibility argument does not apply anymore. With a more complex analysis, it may still be possible to prove a bound that is independent from the length of the queries. Another approach could be to use a different masking, as demonstrated in [27, 38], that avoids the above mentioned case. In our work, we leave this question as an interesting open problem and we use a slightly worse bound that depends on the number of queried message blocks, but is still sufficient to provide optimal security.

Proof of Lemma 5.2. Let \(n\ge 6\), \(q \le 2^n\) be two integers and let us fix a q-tuple of messages \(M^q\in \left( \{0,1\}^{n2^{n/2}-1}\right) _q\) whose total block length is \(\sigma \). We parse as \(M^{1}_i||\cdots ||M^{l_i}_{i}\), where \(i \in [q]\), \(|M^{j}_i|=n\) for every \(i\in [l_i]\), and \(l_i \le 2^{n/2}\). Note that, because of our padding, \(\sum _{i=1}^q l_i \le \sigma +q\). We are going to introduce several new random variables that depend on the uniformly random draw of \(\mathsf {\Pi }\):

  • \(L_1 = \mathsf {\Pi }(0^n)\) and \(L_2 = \mathsf {\Pi }(10^{n-1})\);

  • for all \(i\in [q]\) and all \(j\in [l_{i}-1]\), \(X^{j}_i=M^{j}_i \oplus 2^{j} L_1 \oplus 2^{2j} L_2\) and \(Y^{j}_i = \mathsf {\Pi }( X^{j}_i )\);

  • for \(i\in [q]\),

    $$\begin{aligned} \varSigma _i&=\textsf {PHash} _1[\mathsf {\Pi }](M_i)=M^{l_i}\oplus \bigoplus _{j=1}^{l_i-1}Y^{j}_i \text { and } \\ \varTheta _i&=\textsf {PHash} _2[\mathsf {\Pi }](M_i)=M^{l_i}\oplus \bigoplus _{j=1}^{l_i-1}2^{l_i-j}Y^{j}_i. \end{aligned}$$

Let us fix two distinct integers \(i_1,i_2\) in [q], and assume w.l.o.g. that \(l_{i_1}\ge l_{i_2}\). The first step of our proof is to upper bound the probability to create a collision in the output of \(\textsf {PHash} _1\). More precisely, we want to upper bound the probability that \(\varSigma _{i_1}=\varSigma _{i_2}\).

Claim 6.1

One has

$$\begin{aligned} {\Pr _{ }\left[ \varSigma _{i_1}=\varSigma _{i_2}\right] }&\le 2\frac{l_{i_1}+l_{i_2}+4}{2^n},\\ {\Pr _{ }\left[ \varTheta _{i_1}=\varTheta _{i_2}\right] }&\le 2\frac{l_{i_1}+l_{i_2}+4}{2^n}. \end{aligned}$$

The proof of this claim is deferred to the full version of this paper for reasons of space.

Let \(C_1\) (resp. \(C_2\)) be the number of \(\varSigma \) (resp. \(\varTheta \)) collisions. Summing over every pair of queries yields

$$\begin{aligned} {\mathsf {Ex}_{ }\left[ C_1\right] }&\le \sum _{i_1<i_2}2\frac{l_{i_1}+l_{i_2}+4}{2^n}\le \frac{4q(\sigma +q)+4q^2}{2^n}\le \frac{4q\sigma +9q^2}{2^n}. \end{aligned}$$

Similarly, one has \( {\mathsf {Ex}_{ }\left[ C_2\right] } \le \frac{4q\sigma +9q^2}{2^n} \). Using Markov’s inequality ends the first part of the proof of this lemma.

Our goal is now to upper bound the probability of the following event (dubbed in the following): there exist two distinct indices \(i_1\) and \(i_2\) such that

$$ \textsf {PHash} [\mathsf {\Pi }](M_{i_1})=\textsf {PHash} [\mathsf {\Pi }](M_{i_2}). $$

We are going to break this event into several different events that will be easier to handle:

  • : there exist \(i\in [q]\) and \(j\in [l_i-1]\) such that \(X^{j}_i = 0^n\);

  • : there exist \(i\in [q]\) and \(j\in [l_i-1]\) such that \(X^{j}_i = 10^{n-1}\);

  • : there exist \(i\in [q]\) and three pairwise distinct integers \(j_1,j_2,j_3\in [l_i-1]\) such that \(X^{j_1}_i = X^{j_2}_i=X^{j_3}_i\);

  • : this event corresponds to .

Clearly, one has

(14)

It is also easy to see that

(15)

Let us now consider the event . Fix any \(i\in [q]\) and any pairwise distinct \(j_1,j_2,j_3\in [l_i-1]\). The system (S) of equations \(X^{j_1}_i = X^{j_2}_i=X^{j_3}_i\) can be rewritten as

$$\begin{aligned} (2^{j_1}\oplus 2^{j_2}) L_1 \oplus (2^{2j_1}\oplus 2^{2j_2}) L_2&= M^{j_1}_i \oplus M^{j_2}_i\\ (2^{j_1}\oplus 2^{j_3}) L_1 \oplus (2^{2j_1}\oplus 2^{2j_3}) L_2&= M^{j_1}_i \oplus M^{j_3}_i \end{aligned}$$

Since \(j_1,j_2,j_3\) are pairwise distinct and smaller than \(2^n-1\), the values \(2^{j_1},2^{j_2},2^{j_3}\) are pairwise distinct and (S) is equivalent to

$$\begin{aligned} L_1 \oplus (2^{j_1}\oplus 2^{j_2}) L_2&= (M^{j_1}_i \oplus M^{j_2}_i)/(2^{j_1}\oplus 2^{j_2})\\ L_1 \oplus (2^{j_1}\oplus 2^{j_3}) L_2&= (M^{j_1}_i \oplus M^{j_3}_i)/(2^{j_1}\oplus 2^{j_3}). \end{aligned}$$

Since \(2^{j_2}\ne 2^{j_3}\), the system has a unique solution, and is verified with probability at most \(1/2^n(2^n-1)\).

Summing over every possible choice of \(i,j_1,j_2,j_3\) yields

(16)

where inequality \((*)\) comes from the fact that \(l_i\le 2^{n/2}\) for every \(i\in [q]\).

We now have to handle the event . We make the following claim.

Claim 6.2

The proof this claim is deferred to the full version of this paper for reasons of space.

Combining Eqs. (14), (15), (16) and Claim 6.2 yields

which ends the proof.

7 Reducing the Number of Keys

HtmB-f, HtmB-p1, and HtmB-p2 need 4, 6, and 4 keys, respectively, apart from the hash key. This could be an issue in certain memory-restricted scenarios. In this section, we present some simple variants of these constructions that require less key material, albeit with a slight loss of security.

For any function \( F \in \mathcal {F}\) and \( b \in \{0,1\}^{< n} \), we define two mappings:

$$\begin{aligned} \widehat{F}^b := \lfloor F(b\Vert \cdot ) \rfloor _{n-|b|} \qquad \widetilde{F}^b(X):= F(b\Vert \cdot ), \end{aligned}$$

where \( \lfloor Y \rfloor _{n-d} \) denotes the \( (n-d) \)-least significant bits of Y for all \( Y \in \{0,1\}^n\) and \( d < n \). In the following discussion \( \mathcal {M}\subseteq \{0,1\}^* \).

Single-key variant of HtmB-f: Given and a pair \( H = (H_1,H_2) \) of two \( (\mathcal {K},\mathcal {M},\{0,1\}^{n-2}) \)-keyed hash functions, we define the single-key variant of HtmB-f, denoted 1k-HtmB-f, as:

$$\begin{aligned} \textsf {1k\text {-}HtmB\text {-}f} [H,\mathsf {\Gamma }] := \textsf {HtmB}\text {-}\textsf {f}[H,\widehat{\mathsf {\Gamma }}^{00},\widehat{\mathsf {\Gamma }}^{01},\widetilde{\mathsf {\Gamma }}^{10},\widetilde{\mathsf {\Gamma }}^{11}]. \end{aligned}$$

Theorem 7.1

For \( \epsilon _1,\epsilon _2,\sigma \ge 0 \), \( q \le 2^{n-3} \), , and \( (q,\sigma ,\epsilon _2,\epsilon _1) \)-\( \text {DbACU}_{q} \) hash function \( H_K \) instantiated with , the PRF advantage of any \( (q,\infty ) \)-distinguisher \( \mathscr {A}\) against \( {\textsf {1k\text {-}HtmB\text {-}f}}[H,\mathsf {\Gamma }] \) is given by

$$ \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {1k\text {-}HtmB\text {-}f}}[H,\mathsf {\Gamma }]}(\mathscr {A}) \le \frac{64q^2}{2^{2n}} + \frac{128q^2}{2^{3n}} + \epsilon _2 + 2\epsilon _1. $$

Three-key variant of HtmB-p1: Given and a pair \( H = (H_1,H_2) \) of two \( (\mathcal {K},\mathcal {M},\{0,1\}^{n-1}) \)-keyed hash functions, we define the three-key variant of HtmB-p1, denoted 3k-HtmB-p1, as:

$$\begin{aligned} \textsf {3k\text {-}HtmB\text {-}p1} [H,\mathsf {\Pi }_1,\mathsf {\Pi }_2,\mathsf {\Pi }_3] := \textsf {HtmB\text {-}p1} [H,\widehat{\mathsf {\Pi }}_1^{0},\widehat{\mathsf {\Pi }}_1^{1},\widetilde{\mathsf {\Pi }}_2^{0},\widetilde{\mathsf {\Pi }}_2^{1},\widetilde{\mathsf {\Pi }}_3^{0},\widetilde{\mathsf {\Pi }}_3^{1}]. \end{aligned}$$

Theorem 7.2

For \( n \ge 8 \), \( \epsilon _1,\epsilon _2,\sigma \ge 0 \), \( q \le 2^{n-5} \), , and \( (q,\sigma ,\epsilon _2,\epsilon _1) \)-\( \text {DbACU}_{q} \) hash function \( H_K \) instantiated with key , the PRF advantage of any \( (q,\infty ) \)-distinguisher \( \mathscr {A}\) against \( {\textsf {3k\text {-}HtmB\text {-}p1}}[H,\mathsf {\Pi }_1,\mathsf {\Pi }_2,\mathsf {\Pi }_3] \) is given by

$$ \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {3k\text {-}HtmB\text {-}p1}}[H,\mathsf {\Pi }_1,\mathsf {\Pi }_2,\mathsf {\Pi }_3]}(\mathscr {A}) \le \frac{2q}{2^n} + \frac{6q^{1.5}}{2^{1.5n}} + \frac{64q^2}{2^{2n}} + \frac{128q^2}{2^{3n}} + \epsilon _2 + 2\epsilon _1. $$

Two-Key Variant of HtmB-p2: Given and a pair \( H = (H_1,H_2) \) of two \( (\mathcal {K},\mathcal {M},\{0,1\}^{n-1}) \)-keyed hash functions, we define the two-key variant of HtmB-p2, denoted 2k-HtmB-p2, as:

$$\begin{aligned} \textsf {2k\text {-}HtmB\text {-}p2} [H,\mathsf {\Pi }_1,\mathsf {\Pi }_2] := \textsf {HtmB\text {-}p2} [H,\widehat{\mathsf {\Pi }}_1^{0},\widehat{\mathsf {\Pi }}_1^{1},\widetilde{\mathsf {\Pi }}_2^{0},\widetilde{\mathsf {\Pi }}_2^{1}]. \end{aligned}$$

Theorem 7.3

For \( \epsilon _1,\epsilon _2,\sigma \ge 0 \), \( q \le \min \{2^{n-3},2^n/67n^2\} \), , and \( (q,\sigma ,\epsilon _2,\epsilon _1) \)-\( \text {DbACU}_{q} \) hash function H instantiated with key , the PRF advantage of any \( (q,\infty ) \)-distinguisher \( \mathscr {A}\) against \( {\textsf {2k\text {-}HtmB\text {-}p2}}[H,\mathsf {\Pi }_1,\mathsf {\Pi }_2] \) is given by

$$ \mathbf {Adv}^{\mathsf {prf}}_{{\textsf {2k\text {-}HtmB\text {-}p2}}[H,\mathsf {\Pi }_1,\mathsf {\Pi }_2]}(\mathscr {A}) \le \frac{128q^2}{2^{3n}} + \frac{136q^2}{2^{2n}} + \frac{8q}{2^n} + \epsilon _2 + 2\epsilon _1. $$

The proofs of Theorem 7.1, 7.2, and 7.3 follow very similar strategies as used in the proofs of Theorem 4.1, 4.2, and 4.4, respectively. So, we skip formal proofs for economical reasons. For the sake of verification, we provide proof sketches in the full version of this paper.

8 Conclusion

In this paper, we proposed a novel method of constructing VIL PRFs, dubbed as the Hash-then-modified-Benes or HtmB transformation. Based on the type of internal primitive, we gave three instances of HtmB, viz. HtmB-f, HtmB-p1, and HtmB-p2. We showed that all three instances retain security for close to \( 2^n \) queries. We instantiate the three VIL PRFs using LightMAC+ and PMAC+ based hash functions, called LightHash and PHash, respectively. We explicitly derived relevant collision probability bounds for LightHash and PHash that, in combination with the bounds for HtmB instances, implies almost \( 2^n \) blocks security. Lastly, we proposed some reduced-key variants of HtmB-f, HtmB-p1, and HtmB-p2.

8.1 Further Discussion

On Single-Key Variants for HtmB-p1 and HtmB-p2 : There is a scope of further reducing the key size in case of HtmB-p1 and HtmB-p2 by using 2 and 1 extra bit(s), respectively, for domain separation. However, there is an obstacle in proving the security of resulting constructions. This obstacle stems from the fact that the permutation calls in the lower level are no longer independent of the permutation calls in the upper layer. As a result, the existing bounds on the sum of permutations [8, 10] (in case of HtmB-p1) and mirror theory [9, 33, 34] (in case of HtmB-p2) are no longer applicable. It seems that we need a stronger result like sum of permutations under some added input/output restrictions. A partial positive result in this direction has been shown in [15], where the authors show similar result for queries up to \( 2^{2n/3} \). We leave it as an open problem to extend the result to close to \( 2^n \) queries under appropriate conditions.

On Hash Function Requirement: The reduced-key variants of HtmB need hash functions with unusual output sizes like \( 2n-2 \) and \( 2n-4 \) bits. However, one can easily generate such hash outputs by chopping appropriate bits of an \( \epsilon \)-Almost XOR Universal (AXU) hash function, i.e. a hash function \( H_K \) such that for distinct inputs x, y and any difference \( \delta \), . Suppose we have a pair of n-bit hash functions \( H = (H_1,H_2) \) that satisfies two properties:

  • \( H_b \) are \( \epsilon _1 \)-AXU hash functions for \( b \in [2] \), and

  • H is an \( \epsilon _2 \)-AXU hash function.

Then, if we chop \( d<n \) bits from each of \( H_1 \) and \( H_2 \), the resulting hash function can be shown to be \( (q,\sigma ,q^22^{2d}\epsilon _2,q2^d\epsilon _1) \)-\( \text {DbACU}_{q} \).

Unfortunately, LightHash and PHash of Sect. 5 do not satisfy the AXU condition. Note that, we saved one block cipher call in LightHash and PHash as compared to the hash layer in LightMAC+ and PMAC+, by absorbing the last data block directly. It would be interesting to see whether the original hash layer in LightMAC+ and PMAC+ can be used as appropriate replacements for LightHash and PHash, respectively, in the reduced-key variants.