In this section, we present the main techniques underlying our transformation from succinct arguments of knowledge with small multiplicative prover overhead to SPARKs.
2.2 Extending SPARKs to Arbitrary Computations
The focus of this work is extending the above example to handle arbitrary non-deterministic polynomial-time computation (possibly with a long output) that introduces many complications. For now, we focus on the case of RAM computation that uses only a single processor (we later show how to extend this to arbitrary parallel RAM computations). Specifically, suppose we are given a statement \( (M,x,T) \) with witness w, where M is a RAM machine and we want to prove that \( M(x,w) \) outputs some value y within T steps. We emphasize that our goal is to capture general non-deterministic, polynomial-time computation where the output y is not known in advance, so we would like to simultaneously compute y given \( (M,x,T) \) and w, and prove its correctness. Since M is a RAM machine, it has access to some (potentially large) memory D consisting of n words in memory. We let \( \lambda \) be the security parameter and size of a word, and T be an arbitrary polynomial in \( \lambda \) . Let us try to employ the above strategy in this more general setting.
As M does not necessarily implement an iterated function, the first problem we encounter is that there is no natural way to split the computation into many sub-computations with small input and output. For intermediate statements, the naïve solution would be to prove that running the RAM machine M for k steps starting at some initial memory \( D_{\mathsf {start}} \) results in final memory \( D_{\mathsf {final}} \) . However, this is a problem, because the size of the memory, n, may be large—perhaps even as large as the full running time T—so the intermediate statements we need to prove may be huge!
A natural attempt to mitigate this would be to instead provide a succinct digest of the memory at the beginning and end of each sub-computation and then have the prover additionally prove that it knows a memory string consistent with each digest. Concretely, each sub-computation corresponding to k steps of computation would contain digests \( c_{\mathsf {start}}, c_{\mathsf {final}} \) . The prover would show that there exist strings \( D_\mathsf {start} \) , \( D_\mathsf {final} \) such that (1) \( c_\mathsf {start} \) , \( c_\mathsf {final} \) are digests of \( D_\mathsf {start} \) , \( D_\mathsf {final} \) , respectively, and (2) starting with memory \( D_\mathsf {start} \) and running RAM machine M for k steps results in memory \( D_\mathsf {final} \) . This seems like a step in the right direction, since the statement size for each sub-computation would only depend on the output size of the digest and not the size of the memory. However, the prover’s witness—and hence running time to prove each sub-computation—still scales linearly with the size of the memory in this approach. Therefore, the main challenge we are faced with is removing the dependence on the memory size in the witness of the sub-computations.
Using local updates. To overcome the above issues, we observe that in each sub-computation the prover only needs to prove that the transition from the initial digest \( c_{\mathsf {start}} \) to the final digest \( c_{\mathsf {final}} \) is consistent with k steps of computation done by M. At a high level, we do so by proving that there exists a sequence of k local updates to \( c_{\mathsf {start}} \) that result in \( c_{\mathsf {final}} \) . Then, to verify a sub-computation corresponding to k steps, we can simply check the k local updates to the digest of the memory, rather than checking the memory in its entirety. To formalize this idea, we rely on compressing hash functions that allow for local updates that can be efficiently computed in parallel to the main computation. We call these concurrently updatable hash functions.
Given such hash functions, will use a succinct argument of knowledge \( (\mathcal {P} _\mathsf {sARK},\mathcal {V} _\mathsf {sARK}) \) for an \( \mathbf {NP} \) language \( \mathcal {L}_{\mathsf {upd}} \) that corresponds to checking that a sequence of local updates are valid. Specifically, a statement \( (M,x,k,c_\mathsf {start}, c_\mathsf {final}) \in \mathcal {L}_{\mathsf {upd}} \) if and only if there exists a sequence of updates \( u_1, \ldots , u_k \) such that, starting with short digest \( c_\mathsf {start} \) , running M on input x for k steps specifies the updates \( u_1,\ldots , u_k \) that result in a digest \( c_\mathsf {final} \) . Then, as long as the updates are themselves succinct, the size of the witness scales only with the number of steps of the computation and not with the size of the memory.
To make the above approach work, we need updatable hash functions that satisfy the following two properties:
(1)
Updates can be computed efficiently in parallel to the main computation.
(2)
Updates can be verified as modifying only the specified locations in memory.
We next explain how we obtain the required hash functions satisfying the above properties. We believe that this primitive and the techniques used to obtain it are of independent interest.
Concurrently Updatable Hash Functions. Roughly speaking, concurrently updatable hash functions are computationally binding hash functions that support updating parts of the underlying message without re-hashing the whole message. For efficiency, we additionally require that one can perform several sequential updates concurrently. For soundness, we require that no efficient adversary can find two different openings for the same location even if it is allowed to perform polynomially many update operations. A formal definition appears in Section
5.
We focus on the case where each update is local (a single word per timestep), but we show how to extend this to updating many words in parallel in Section
5. Our construction relies on Merkle trees [
43] and hence can be instantiated with any collision-resistant hash function. Recall that a Merkle tree uses a compressing hash function, which we assume for simplicity is given by
\( h:\lbrace 0,1\rbrace ^{2\lambda }\rightarrow \lbrace 0,1\rbrace ^{\lambda } \) , and is obtained via a binary tree structure where nodes are associated with values. The leaves are associated with arbitrary values and each internal node is associated with a value that is the hash of the concatenation of its children’s values.
It is well known that Merkle trees, when instantiated with a collision-resistant hash function h, act as short (binding) commitments with local opening. The latter property enables proving claims about specific blocks in the input without opening the whole input, by revealing the authentication path from some input block to the root (i.e., the hashes corresponding to sibling nodes along the path from the leaf to the root). Not only do Merkle trees have the local opening property, but the same technique allows for local updates. Namely, one can update the value of a specific word in the input and compute the new root value without recomputing the whole tree (by updating the hashes along the path from the updated block to the root). All of these local procedures cost time that is proportional to the depth of the tree, \( \log _2 n \) , as opposed to the full memory n. We denote this update time as \( \beta \) (which may additionally depend polynomially on \( \lambda \) , for example, to compute the hash function at each level in the tree).
Let us see what happens when we use Merkle trees as our hash function. Recall that the Merkle tree contains the hash of the memory at every step of the computation, and we update its value after each such step. The latter operation, as mentioned above, takes \( \beta \) time. So even with local updates, using Merkle trees naïvely incurs a \( \beta \) delay for every update operation that implies a \( \beta \) multiplicative delay for the whole computation (which we want to avoid)! To handle this, we use a pipelining technique to perform the local updates in parallel.
Pipelining updates. Consider two updates \( u_1 \) and \( u_2 \) that we want to apply to the current Merkle tree sequentially. We observe that, since Merkle trees updates work “level by level,” we can first update the first level of the tree (corresponding to the leaves) according to \( u_1 \) . Then, update the second layer according to \( u_1 \) and in parallel update the first layer using \( u_2 \) . Continuing in this fashion, we can update the third layer according to \( u_1 \) and in parallel update the second layer using \( u_2 \) , and so on. The idea can be generalized to pipeline \( u_1,\ldots ,u_k \) , so the final update \( u_k \) completes after \( (k-1)+\beta \) steps, and the memory is consistent with the Merkle tree given by performing update operations \( u_1,\ldots ,u_k \) sequentially. The implementation of this idea requires \( \beta \) additional parallel threads, since the computation for at most \( \beta \) updates will overlap at a given time. A key point that allows us to pipeline these concurrent updates is that the operations at each level in the tree are data-independent in a standard Merkle tree. Namely, each processor can perform all of the reads/writes to a given level in the tree at a single timestep, and the next processor can continue in the next timestep without incurring any delay.
Verifying that updates are local. With regards to the soundness of this primitive, a subtle—yet important—point that we need in our application is that it must be possible to prove that a valid update only modifies the locations it specifies. For example, suppose a cheating prover updates the digest with respect to one location in memory while simultaneously rewriting other locations in memory in a way that does not correspond to the memory access done by the machine M. Then, the prover will later be able to open inconsistent values and prove that M computes whatever it wants. Moreover, the prover could gradually make these changes across many different updates. Fortunately, the structure of Merkle trees allow us to prove that a local update only changes a single location. At a high level, this is because the authentication path for a leaf in a Merkle tree effectively binds the root of the tree to the entire memory. Thus, we show that if a Merkle tree is updated at some location, then one can use the authentication path to prove that no other locations were modified. Furthermore, we show in the general case how to extend this for updating many locations in a single update.
Ensuring optimal prover runtime. Using the above ingredients, we discuss how to put everything together to ensure optimal prover runtime. Concretely, suppose we have a concurrently updatable hash function where each update takes time \( \beta \) , and a succinct non-interactive argument of knowledge with quasilinear prover overhead for the language \( \mathcal {L}_{\mathsf {upd}} \) . Recall that a statement \( (M,x,k,c_\mathsf {start}, c_\mathsf {final}) \in \mathcal {L}_{\mathsf {upd}} \) if there exists a sequence of k hash function updates such that (1) the updates are consistent with the computation of M and (2) applying these updates to \( c_{\mathsf {start}} \) results in \( c_{\mathsf {final}} \) . Let \( \alpha ^\star \) be the multiplicative overhead of the succinct argument with respect to the number of updates (so a computation with \( k \le T \) updates takes time \( k \cdot \alpha ^\star \) to prove). Note that \( \alpha ^\star \in \mathrm{poly} (\beta ,\log T) \) , as we require that the total time to prove a \( \mathcal {L}_{\mathsf {upd}} \) statement is quasilinear in the work, and a statement for at most T updates requires \( T \cdot \beta \) total work.
As discussed above, to prove that \( M(x,w) \) outputs a value y in T steps, we split the computation into m sub-computations that all complete by time T. The ith sub-computation will consist of a “compute” phase, where we compute \( k_i \) steps of the total T computation steps, and a “proof” phase, where we use the succinct argument to prove correctness of those \( k_i \) steps. For the “compute” phase, recall that performing \( k_i \) steps of computation while also updating the digest takes \( k_i \cdot \beta \) total work. However, as described above, we can pipeline these updates so the parallel time to compute these updates is only \( (k_i - 1) + \beta \) .
For the “proof” phase to complete in the desired amount of time, we need to set the values of \( k_i \) appropriately. Each proof for \( k_{i} \le T \) steps of computation takes at most \( k_i \cdot \alpha ^\star \) time. Therefore, the largest “chunk” of computation we can compute and prove by roughly time T is \( T/(\alpha ^\star + 1) \) . For convenience, let \( \gamma \triangleq \alpha ^\star + 1 \) . Then, in the first sub-computation, we can compute and prove \( k_{1} = T/\gamma \) steps of computation. In each subsequent computation, we compute and prove a \( \gamma \) fraction of the remaining computation. Putting everything together, we get that \( k_i = (T/\gamma) \cdot (1-1/\gamma)^{i-1} \) for \( i \in [m-1] \) and then \( k_m \lt \gamma \) is the number of remaining steps such that \( \sum _{i=1}^m k_i = T \) . This results in roughly \( \gamma \log T \in \mathrm{poly} (\beta , \log T) \) total sub-proofs, meaning that the proof size depends only polylogarithmically on T.
In Figure
1, we show the structure of the compute and proof phases for all
m sub-computations. We emphasize that the entire protocol completes within
\( T+\alpha ^{\star } \cdot \gamma + \beta \) parallel time, since the first
\( m-1 \) sub-proofs complete by time
\( T + \beta \) , all
m sub-computations complete by time
\( T+\beta \) , and the proof of the final
\( \gamma \) steps takes roughly
\( \alpha ^{\star } \cdot \gamma \) time to prove. Since
\( \alpha ^{\star } \) ,
\( \gamma \) , and
\( \beta \) are in
\( \mathrm{poly} (\lambda ,\log T) \) , this implies that we only have a small additive rather than multiplicative overhead.
We note that in the overview above where we discuss SPARKs for iterated functions, correctness of the final sub-computation is proven by having the prover send the witness in the clear, and having the verifier check it directly. In our full construction, we instead have the prover give a succinct proof for the last sub-computation. The main reason for this is that for the case of general parallel RAM computations, we want the communication complexity and the complexity of the verifier to depend only poly-logarithmically on the depth T and processors \( \rho \) used in the original computation. However, the witness for the final sub-computation may have length linear in \( \rho \) (since at each step in the final sub-computation, the witness may specify the actions of each of the \( \rho \) processors). Having the prover instead provide a succinct proof solves this issue.
Next, we note that we have a \( \beta \) gap between the time that the “compute” phase ends and the “proof” phase begins for a particular sub-computation. This is because we have to wait \( \beta \) additional time to finish computing the updates before we can start the proofs. However, we can immediately start computing the next sub-computation without waiting for the updates to complete. Last, the number of processors used in the protocol is \( \beta \) at all times in the constantly running “compute” phase that is additionally computing updates to the digest in parallel. Then, to run each of the m sub-proofs in parallel, we get at most a factor of m times the number of processors used by a single sub-proof.
Computing the initial digest. Before giving the full protocol, we address a final issue, which is for the prover to compute the digest of the initial memory string. Specifically, the prover needs to hash a string \( D \in \{0,1 \}^n \) , which the RAM machine M assumes contains its inputs \( (x,w) \) . Directly hashing to the string \( x || w \) would require roughly \( \left|x \right|+\left|w \right| \) additional time, which could be as large as T. To circumvent the need to compute the initial digest, we simply do not compute a digest of the initial memory! Instead, we start with a digest of an uninitialized memory that can be computed efficiently and allows each position to be initialized exactly once whenever it is first accessed.
We extend our hash function definition to enable this as follows: We start with a dummy value \( \bot \) for the leaves of the Merkle tree. Because the leaves all have the same value, we can compute the root of the Merkle tree efficiently without touching all of the nodes in the tree. Specifically, if the leaves have the value \( \mathbf {dummy}(0) \) , then we can define the value of the nodes at level j recursively as \( \mathbf {dummy}(j) = h(\mathbf {dummy}(j-1) || \mathbf {dummy}(j-1)) \) . Then the initial digest is just the root \( \mathbf {dummy}(\log n) \) . Note that here, the prover does not need to initialize the whole tree in memory with dummy values, it simply needs to compute \( \mathbf {dummy}(\log n) \) as the initial digest.
Whenever the prover accesses a location in D for the first time, it performs the corresponding local update to the Merkle tree. However, performing this update is non-trivial as many of the nodes in the Merkle tree may still be uninitialized. What saves us is that any uninitialized node must correspond to leaves that are also uninitialized, so they still have the value \( \bot \) . As such, we can compute the value of any uninitialized node at level j efficiently as \( \mathbf {dummy}(j) \) . To maintain efficiency, the prover can keep track of a bit for each node to check if it has been initialized or not.
Given a single authentication path for a newly initialized location in memory, the verifier can check that this path is a valid opening for \( \bot \) with the previous digest and for the new value with the updated digest. This guarantees that only the newly initialized value was modified, and the verifier can make sure each location is updated at most once by disallowing the prover from updating locations to \( \bot \) . Furthermore, the verifier can check that any initialized value not part of the witness (corresponding to the input x) is consistent with what M expects.