Abstract of “Cryptography for Efficiency: New Directions in Authenticated Data Structures” by Charalampos Papamanthou, Ph.D., Brown University, May 2011. Cloud computing has emerged as an important new computational and storage medium and is increasingly being adopted both by companies and individuals as a means of reducing operational and maintenance costs. However, remotely-stored sensitive data may be lost or modified and third-party computations may not be performed correctly due to errors, opportunistic behavior, or malicious attacks. Thus, while the cloud is an attractive alternative to local trusted computational resources, users need integrity guarantees in order to fully adopt this new paradigm. Specifically, they need to be assured that uploaded data has not been altered and outsourced computations have been performed correctly. Tackling the above problems requires the design of protocols that, on the one hand, are provably secure and at the same time remain highly efficient, otherwise the main purpose of adopting cloud computing, namely efficiency and scalability, is defeated. It is therefore essential that expertise in cryptography and efficient algorithmics be combined to achieve these goals. This thesis studies techniques allowing the efficient verification of data integrity and computations correctness in such adversarial environments. Towards this end, several new authenticated data structures for fundamental algorithmics and computation problems, e.g., hash table queries and set operations, are proposed. The main novelty of this work lies in employing advanced cryptography such as lattices and bilinear maps, towards achieving high efficiency, departing from traditional hash-based primitives. As such, the proposed techniques lead to efficient solutions that introduce minimal asymptotic overhead and at the same time enable highly-desirable features such as optimal verification mechanisms and parallel authenticated data structures algorithms. The small asymptotic overhead does translate into significant practical savings, yielding efficient protocols and system prototypes. Cryptography for Efficiency: New Directions in Authenticated Data Structures by Charalampos Papamanthou B.Sc., Applied Informatics, University of Macedonia, 2003 M.Sc., Computer Science, University of Crete, 2005 M.Sc., Computer Science, Brown University, 2007 A dissertation submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in the Department of Computer Science at Brown University Providence, Rhode Island May 2011 c Copyright 2011 by Charalampos Papamanthou This dissertation by Charalampos Papamanthou is accepted in its present form by the Department of Computer Science as satisfying the dissertation requirement for the degree of Doctor of Philosophy. Date Roberto Tamassia, Director Recommended to the Graduate Council Date Michael T. Goodrich, Reader University of California, Irvine Date Anna Lysyanskaya, Reader Date Franco P. Preparata, Reader Approved by the Graduate Council Date Peter M. Weber Dean of the Graduate School iii Vita Charalampos Papamanthou was born in Trikala, Greece, 29 years ago. Right after graduation from high school, he began his college studies in Thessaloniki, Greece, receiving his bachelor’s degree in Applied Informatics from the University of Macedonia in 2003. He then traveled south to pursue a master’s degree in Computer Science at the beautiful island of Crete. There, and under the Mediterranean sun, he also did research at the Foundation for Research and Technology Hellas. Upon completion of his studies at the University of Crete in 2005, he decided to cross the Atlantic and move to Providence, Rhode Island, in order to attend Brown University for graduate school. At Brown, he received both his master’s and doctoral degrees in Computer Science in 2007 and 2011 respectively. He was also the recipient of the Kanellakis and the van Dam fellowships. While at graduate school, he spent two summers at the West Coast, interning at Intel Research and Microsoft Research. His research interests are in computer security, applied cryptography and in the design and analysis of algorithms. Beginning summer 2011, he will be joining the University of California at Berkeley to work as a postdoctoral researcher at the Computer Science Division. iv Preface Cloud computing has emerged as an important new computational and storage medium and is increasingly being adopted both by companies and individuals as a means of reducing operational and maintenance costs. However, remotely-stored sensitive data may be lost or modified and third-party computations may not be performed correctly due to errors, opportunistic behavior, or malicious attacks. Thus, while the cloud is an attractive alternative to local trusted computational resources, users need integrity guarantees in order to fully adopt this new paradigm. Specifically, they need to be assured that uploaded data has not been altered and outsourced computations have been performed correctly. Tackling the above problems requires the design of protocols that, on the one hand, are provably secure and at the same time remain highly efficient, otherwise the main purpose of adopting cloud computing, namely efficiency and scalability, is defeated. It is therefore essential that expertise in cryptography and efficient algorithmics be combined to achieve these goals. This thesis studies techniques allowing the efficient verification of data integrity and computations correctness in such adversarial environments. Towards this end, several new authenticated data structures for fundamental algorithmics and computation problems, e.g., hash table queries and set operations, are proposed. The main novelty of this work lies in employing advanced cryptography such as lattices and bilinear maps, towards achieving high efficiency, departing from traditional hash-based primitives. As such, the proposed v techniques lead to efficient solutions that introduce minimal asymptotic overhead and at the same time enable highly-desirable features such as optimal verification mechanisms and parallel authenticated data structures algorithms. The small asymptotic overhead does translate into significant practical savings, yielding efficient protocols and system prototypes. vi Acknowledgments Many individuals contributed to the outcome of this beautiful educational journey at Brown University. First and foremost, I deeply thank my thesis advisor, Roberto Tamassia, who guided me through the challenging path of graduate school. Roberto’s vast experience in research, combined with his kindness, smile and sincerity, taught me how to produce high-quality work with a positive attitude, always being precise, objective and very self-critical. His efficient quest for perfection, his work ethic, as well as his constructive feedback were vital in shaping not only my research philosophy, but also my daily presence and interactions in an academic environment. Finally, Roberto’s advice on personal matters and academics has been really invaluable and was always promptly and generously provided, whenever needed. I could not have hoped for a better advisor. Second, I am grateful to Franco P. Preparata, with whom I closely collaborated during my first two years at Brown. Franco was the first faculty member I met as soon as I arrived in Providence, back in 2005. Having known Franco for six years now, I am still amazed by his seemingly endless knowledge of Computer Science, his high integrity, and his loyalty to his colleagues. I thank him for the so many technical and political discussions we had, his meticulously prepared lectures on parallel algorithms and computational biology, and for the provably correct advice he would always provide at the right time. Also, I would like to thank his wife, Rosa Maria, for inviting me multiple times for dinner at their place. vii Admittedly these have been the most original and tasteful Italian dinners ever! I would also like to thank the other members of my committee, Michael T. Goodrich and Anna Lysyanskaya. Michael has been a great collaborator, always encouraging new ideas and a diverse research agenda. He provided excellent feedback on the final text of this thesis. Anna taught me foundations of cryptography, through an engaging introductory class and through the crypto reading group. Her presence in the department and my interactions with her greatly influenced the research path of this dissertation. Also, many thanks to Nikos Triandopoulos, who, apart from a close friend, has been a reliable colleague, always eager to carefully listen to all my ideas and concerns. Many results in this dissertation have been the outcome of a great deal of fruitful discussions and long technical meetings with him. Finally, I would like to thank Alptekin Küpçü, C. Chris Erway, Bernardo Palazzi, Alexander Heitzmann, Olya Ohrimenko and Danfeng Yao for the work we did together on topics related to this thesis, as well as Petros Maniatis and Seny Kamara for being my internship mentors at Intel Research and Microsoft Research respectively. The Brown CS faculty, and in particular the professors I interacted with mostly, namely, John Savage, Claire Mathieu, Philip Klein, Tom Doeppner, Pascal Van Hentenryck, Rodrigo Fonseca and Eli Upfal, have been extraordinary. Their persistent dedication to high-quality research and teaching nurtured an inspiring and challenging environment for every graduate student in the department. Also, everyday life in the CIT would not have come easy had it not been for the Brown CS Astaff and Tstaff. Thank you Janet, Lauren, Jane, Genie, Dorinda, Max, Phirum and Jeff! Back in Greece, my professors and friends at UoM and UoC provided just the right academic environment. My undergraduate mentors, Konstantinos Paparrizos and Nikolaos Samaras, introduced me to research. My advisor at the University of Crete, Ioannis (Yanni) G. Tollis, helped me transform from an excited undergraduate student to an ambitious and focused graduate student. Working with Yanni was a great experience, and I am grateful to him for inspiring me to do research on exciting problems. Finally, many thanks to my friends viii and colleagues from my university years in Greece, Dimitris Xinidis, Manos Papaggelis, Adam Arvelakis, Alexandros Stamatakis, Pavlos Pavlidis, Christos Sgaras and Kostas Tzouvaras, for keeping in touch and for sharing their exciting stories with me while I was far away. Life in Providence would not have been as fun without the good times spent with the following people: Socrate, Dimitri, Yanni, Ari, Pari, Misha, Foteini, Yorgo, Maria, Anastasia, Aggeliki, Olga, Panagioti, Katerina, Basili, Saki, Aparna, Menia, Sophocle, Wenjin, Radu and Doria, thank you all guys! The Papavasiliou family in Attleboro made sure America felt like home, and of course, a big thank you belongs to my childhood buddies Pete, Achilleas, Ilias and Manos, as well as to all the members of my extended family, for being a continuous source of encouragement. Finally, many thanks to Vasili, Petro and Michali, for an amazing summer in Seattle (and for taking care of me while on crutches). The research performed in this thesis was supported by the Center for Geometric Computing, the Plastech Professorship of Computer Science, the van Dam fellowship and the Kanellakis fellowship at Brown University, the National Science Foundation, NetApp and IAM Technology. It has been an honor for me to be a Kanellakis fellow and I would like to wholeheartedly thank the Kanellakis family for their generous support. Last but not least, I am grateful to my parents Yianni and Gioula and to my brother Christo, for their unconditional love and support during all these years, as well as for everything they sacrificed for my upbringing and education. Finally, to honor her memory, I dedicate this thesis to my late grandmother Artemisia, for the values she imparted to me about life with her simple, but deep in meaning, sayings. ix Sth mn mh thc giagiˆc mou, ArtemisÐac QristodoÔlou. To the memory of my grandmother, Artemisia Christodoulou. x Contents List of Tables xv List of Figures xviii 1 Introduction 1 1.1 Thesis motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Preliminaries and related work 2.1 2.2 2.3 8 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 Data structures and authenticated data structures . . . . . . . . . . . 8 2.1.2 Complexity model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.3 Optimality and public verifiability . . . . . . . . . . . . . . . . . . . . 14 Protocols and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Three-party authenticated data structures protocol . . . . . . . . . . 16 2.2.2 Two-party authenticated data structures protocol . . . . . . . . . . . 21 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.1 Generic collision-resistant hashing . . . . . . . . . . . . . . . . . . . . 30 2.3.2 More advanced cryptography . . . . . . . . . . . . . . . . . . . . . . 31 2.3.3 Relation to memory checking . . . . . . . . . . . . . . . . . . . . . . 32 xi 3 Accumulators for authenticated hash tables 3.1 3.2 3.3 3.4 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1.1 Hash tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1.2 The RSA accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.1.3 The bilinear-map accumulator . . . . . . . . . . . . . . . . . . . . . . 43 3.1.4 The accumulation tree . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Scheme based on the RSA accumulator . . . . . . . . . . . . . . . . . . . . . 48 3.2.1 Main authenticated data structure . . . . . . . . . . . . . . . . . . . 49 3.2.2 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2.3 Queries and verification . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.4 Correctness and security . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.5 A more practical scheme . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2.6 Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Scheme based on the bilinear-map accumulator . . . . . . . . . . . . . . . . 74 3.3.1 Queries and verification . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.3.2 Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Complexity limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4 Authenticated structures based on lattices 4.1 4.2 35 Lattice definitions 93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.1.1 What is a lattice? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.1.2 Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.1.3 Lattice-based hash function . . . . . . . . . . . . . . . . . . . . . . . 100 4.1.4 Parallel models of computation . . . . . . . . . . . . . . . . . . . . . 102 Main construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2.1 Algebraic tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2.2 Algorithms of the scheme 4.2.3 Partial digests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 . . . . . . . . . . . . . . . . . . . . . . . . 105 xii 4.2.4 Correctness and security . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.2.5 A note on repeated linearity . . . . . . . . . . . . . . . . . . . . . . . 117 4.3 Authenticated bloom filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.4 Parallel online memory checking . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.5 Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5 Authenticated sets operations with bilinear maps 5.1 127 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.1.1 Sets collection data structure scheme . . . . . . . . . . . . . . . . . . 132 5.1.2 Subset witnesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.2 Construction and algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.3 Queries and verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.3.1 Intersection query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.2 Union query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.3 Subset query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 5.3.4 Set difference query . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.4 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.5 Proof of correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 5.6 Proof of security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.6.1 5.7 5.8 Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.7.1 Keyword-search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.7.2 Timestamped keyword-search . . . . . . . . . . . . . . . . . . . . . . 167 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.8.1 System setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.8.2 Communication cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 5.8.3 Verification cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 xiii 6 Optimality with multilinear forms 6.1 Dictionary data structure 6.1.1 174 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Non-optimal authenticated dictionaries . . . . . . . . . . . . . . . . . 179 6.2 Multilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 6.3 An optimal authenticated dictionary . . . . . . . . . . . . . . . . . . . . . . 181 6.4 6.3.1 Dictionary queries and verification . . . . . . . . . . . . . . . . . . . 185 6.3.2 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 6.3.3 Application in the two-party protocol . . . . . . . . . . . . . . . . . . 194 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 7 Conclusions 197 7.1 Overview of thesis results and discussion . . . . . . . . . . . . . . . . . . . . 199 7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 xiv List of Tables 3.1 In this table, we exhibit a detailed comparison of asymptotic access and group complexities of various authenticated data structure schemes in the literature with the complexities of our schemes. The underlying data structure scheme is for a hash table storing n elements. All the authenticated data structure schemes compared are defined by algorithms {genkey, setup, update, refresh, query, verify} (see Definition 2.3). Parameter 0 <  < 1 is a constant, “D. Log” stands for “Discrete Logarithm”, “Generic CR” stands for “Generic Collision Resistance” and “B. q-DH” stands for “Bilinear q-strong Diffie-Hellman”. In all constructions the authenticated data structure has group complexity (i.e., size) O(n) and genkey() has O(1) complexity. Π(q) denotes the proof for a query q and upd is the update information output by algorithm update(). Our schemes are denoted with RHT (RSA-based authenticated hash table) and BHT (bilinear-map-based authenticated hash table). The “one-star” notation ∗ denotes an expected complexity, the “two-star” notation ∗∗ denotes an expected amortized complexity, whereas the “plus” notation + denotes an amortized complexity. All schemes in the table are publicly-verifiable. . . . . xv 36 4.1 Asymptotic access and group complexities of various authenticated data structure schemes (see Definition 2.3) for a dynamic table of n entries. Parameter 0 <  < 1 is a constant and GAPSVP is the gap shortest vector problem in lattices (Definition 4.1). In all schemes, the authenticated structure has group complexity O(n) and genkey() has O(1) complexity. Note that [90] is the published conference version of Chapter 3. The acronyms of the other assumptions can be found in Table 3.1. All presented schemes in the table are publicly verifiable. 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Asymptotic access and group complexities of various authenticated data structure schemes defined by algorithms {genkey, setup, update, refresh, query, verify}, for a sets collection data structure of m sets: The sum of sizes of all the sets is M and 0 <  < 1 is a constant. FHE stands for fully-homomorphic encryption, the security of which is based on lattice assumptions, such as the bounded distance decoding and the SplitKey distinguishing problems—see [43]. We note that the scheme based on FHE is not publicly-verifiable. It however provides privacy on top of integrity of computations. We show complexities for an intersection query on t = O(1) sets, outputting an intersection δ elements. All sizes of the intersected and updated sets are Θ(n). 5.2 . . . . . . . . . . . . 130 Comparison of a 2-intersection communication overhead (proof size) of the scheme presented by Morselli et al. [79] with our scheme. Here n1 and n2 are the sets sizes that are intersected and δ is the size of the intersection. . . . . 172 xvi 6.1 Asymptotic access and group complexities of various authenticated data structure schemes for a dynamic dictionary storing n elements, compared with the optimal authenticated dictionary MFD based on multilinear forms and derived in this chapter. Parameter 0 <  < 1 is a constant and “M. q-DH” stands for “Multilinear q-strong Diffie-Hellman”. The various acronyms used for variables and assumptions have all been defined in Table 3.1. Note that our construction requires two assumptions, namely the assumptions M. q-DH and Generic CR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 7.1 Asymptotic access and group complexities of the authenticated data structure schemes presented in this thesis, applied to the fundamental problem of verifying read/write operations on an array of n entries, and compared with the first result on dynamic authenticated data structures by Naor and Nissim [81]. We note that, since all complexities for the plain table data structure are constant, no authenticated data structure scheme presented is optimal. Moreover, based on the recent lower bound for memory checking by Dwork et al. [35], it seems unlikely that such a scheme could be derived. . . . . . . . . . . . . . . 198 xvii List of Figures 2.1 The three-party authenticated data structures protocol. During the update phase, the source sends an update u ∈ U to the server along with the respective update information upd output by update(). During the query phase, the client sends a query q ∈ Q to the server and the server runs algorithm query() to output the proof Π(q) for the respective answer. . . . . . . . . . . . . . . 2.2 16 The two-party authenticated data structures protocol. During the query phase, the client sends a query q ∈ Q to the server and the server runs algorithm query() to output the proof Π(q) for the answer. During the update phase, the client sends to the server an update u ∈ U, which relates to a certain set of queries Qu ⊆ Q. Then the server computes the set of proofs Π(Qu ). This set of proofs will be used by function z(.) of Assumption 2.1, which will output δu (Dh ) and δu (auth(Dh )), which are subsequently input to algorithm update(). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 22 The accumulation tree of a set of 64 elements for  = 31 : every internal node 1 has 4 = 64  children, there are 3 = 1  levels in total, and there are 641−i/3 nodes at level i = 0, 1, 2, 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii 47 4.1 Tree T built on top of a table with 8 values x1 , x2 , . . . , x8 . After producing an n-admissible radix-2 g(.) representation of the children digests, we multiply with either U or D, then we add the two resulting digests and we compute the hash function on them by multiplying with M. At the leaves of the tree we show the terms that correspond to each index, as computed by Theorem 4.3 (i.e., the partial digests of the root r with reference to every value at the table). The g(.) representation of the internal nodes are indicated with dashed lines (see Definition 4.9). Note that the g(.) representations of the internal nodes are the sum of specific f (.) representations of the leaves, for example, g(d(r12 )) = f (Lf (Lf (x5 ))) + f (Lf (Rf (x6 ))) + f (Rf (Lf (x7 ))) + f (Rf (Rf (x8 ))), where MU = L and MD = R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 xix Chapter 1 Introduction During the last few years, cloud computing has emerged as an important new computational and storage medium [55]. In fact, remote data storage (e.g., Amazon S3) and outsourcing of computation (e.g., Google docs) have become a major everyday phenomenon. An increasingly large number of companies and individuals have adopted cloud computing as a means of reducing operational and maintenance costs. However, the cloud is not a panacea. Quoting from an article that was published in 2009 at techcrunch.com [2], “...T-Mobile and Danger, the Microsoft-owned subsidiary that makes the Sidekick, has just announced that they’ve likely lost all user data that was being stored on Microsoft’s servers due to a server failure...”. Therefore, and beyond the control of its owner, remotely stored sensitive data may be lost, modified or accessed by unauthorized entities. Additionally, third-party (i.e., cloud) computations may not be performed correctly, due to errors, opportunistic behavior or malicious attacks. All these cases imply that, while the cloud is an attractive alternative to local trusted computational resources, there is a series of security threats that needs to be addressed in order for this paradigm to be fully adopted by the users. For example, integrity and privacy guarantees are highly needed: Specifically, users need to be assured that remote 1 2 data and computations have not been altered and no cloud data has leaked. Tackling the above problems requires the design of protocols and the development of prototypes that, on the one hand, are provably secure and at the same time remain highly efficient, otherwise the main purpose of adopting cloud computing, i.e., efficiency and scalability, is defeated. In other words, the provable security added to a cloud service should not lead to major performance penalties and the induced overhead should be negligible compared to the actual computational resources needed by the application when executing in an insecure environment. It is essential that expertise in cryptography and efficient algorithmics be combined to achieve these goals. This thesis addresses the first aspect of cloud security, namely the verification of cloud data integrity and cloud computations correctness. It is an extended and formal study of authenticated data structures [105], which comprise systematic and efficient methods for cryptographically checking the integrity of dynamic structured data—stored at adversarial environments—and queries executed on it. Namely, given that a data structure is stored at some untrusted entity and some computation can be performed over it by this untrusted entity (e.g., output yes if x is contained in a dictionary D or output the shortest path from node v to node u in a graph G), an authenticated data structure provides the cryptographic machinery and the algorithms for deciding, without having access to the data structure itself but only to some constant reliable (and possibly secret) space, whether the returned answer is correct or not. First, an authenticated data structure should be secure: A computationally-bounded adversary should not be able to produce a valid proof for a false answer, under a well-accepted computational assumption. Second, it should be efficient: Its algorithms should have low complexity, ideally not adding too much overhead. Successfully combining both these goals comprises a challenging task, substantially depending on the underlying cryptography. Under this premise, the main novelty of this thesis lies in employing advanced cryptography for constructing highly efficient authenticated data structures 3 departing in this way from widely employed hash-based constructions (e.g., Merkle trees [77]) which traditionally use collision-resistant hash functions (e.g., SHA-2 [85]) as a black box and enforce in this way certain complexity lower bounds [106]. Specifically, the coupling of certain cryptographic primitives, such as accumulators [13], lattices [3] and bilinear maps [60], with suitable data structuring and algorithms techniques, such as hash tables [29] and search trees [47], is explored and exploited, leading to efficient solutions that introduce minimal (and sometimes zero) asymptotic overhead. The small asymptotic overhead does translate into significant practical savings, yielding protocols that compare favorably in practice with existing work. The security of our constructions is based on computational assumptions widely established and accepted by the cryptography community. 1.1 Thesis motivation The problem of efficiently verifying the integrity of structured data stored at untrusted resources has been an active research area for almost two decades, beginning with the seminal paper of Merkle [77], where the well-known—and widely used in practice since then—hash trees were introduced. Being an alternative to plain digital signatures, hash trees comprise an efficient way to provide integrity proofs for structured data (e.g., an array or a dictionary) stored at computationally-bounded adversarial sites—certain costs, compared to digital signatures techniques, decrease from linear to logarithmic. The problem was later formalized by Blum et al. [15] who introduced memory checking, i.e., mechanisms for reliably reading and writing memory cells, when the memory is not to be trusted. Later on, and after Naor and Nissim dynamized Merkle’s solution [81], it became clear that verification mechanisms for more complicated structures (e.g., supporting dynamic operations) and more general query types (e.g., a verifying the output of an algorithm) were needed. Both these verification tasks could be achieved via memory checking techniques, since every data update or computation may be viewed as writing and reading bits from memory respectively. However, and as it 4 usually happens when directly employing fundamental primitives (e.g., zero-knowledge [45], oblivious RAM [86]) in cryptography for solving more complicated problems, inefficiency is a major issue. This motivated the study of authenticated data structures [105], a model where untrusted parties answer queries on a dynamic data structure providing a proof of validity of each answer to the user, in an efficient way. So far in the literature, the vast majority of algorithms and techniques for authenticated data structures (e.g., [4, 6, 48, 53, 72, 74, 75, 77, 81, 88, 104, 112]) have traditionally relied on cryptographic hashing. In particular, collision-resistance, a fundamental property required for data integrity, is achieved through the use of black box generic functions, such as SHA-2 (these functions are believed to be collision-resistant in practice but there exists no formal proof for this). However, this black box property imposes certain complexity limitations through existing lower bounds [106]. For instance, for a dynamic dictionary of size n, Ω(log n) proof complexities are inevitable, since the internal function of these primitives is not exploitable in any meaningful way. Aiming at more efficient solutions, this thesis studies the effect of cryptography on the efficiency of various authenticated data structures, by exploring certain algebraic properties of advanced cryptographic primitives that traditional black box generic hash functions lack. Combined with suitable data structures and algorithms, these properties comprise a deciding factor in deriving asymptotically better (even optimal) authenticated data structures, resulting in more scalable protocols and applications. 1.2 Thesis outline The thesis outline is as follows. We begin with Chapter 2 where we present basic definitions that we extensively use in the thesis, such as the data structure scheme definition (Definition 2.2) and the authenticated data structure scheme definition (Definition 2.3), along with its correctness and security definitions (Definition 2.5 and 2.4 respectively). Moreover in Chapter 2 we show how, given any authenticated data structure scheme as a black box, we 5 can derive a three-party authenticated data structures protocol (Theorem 2.1) or a two-party authenticated data structures protocol (Theorem 2.2), both traditionally used to describe authenticated data structure solutions in the literature (e.g., see [75] for a three-party protocol and [92] for a two-party protocol). This black box approach not only eases the presentation of our results in subsequent chapters of the thesis but also helps us avoid repeating notions related to protocols in each chapter. Each of the remaining chapters of the thesis (Chapters 3 through 6) comprises a study of the application of a certain cryptographic primitive for solving a specific authenticated data structure problem, efficiently: In Chapter 3, we design authenticated data structures for set membership queries on hash tables, using cryptographic primitives called accumulators [23, 83] and applying them in a novel hierarchical way over the stored data. We provide the first construction for authenticating a hash table with constant query cost and sublinear update cost, strictly improving upon previous methods, addressing and answering an open problem posed in [81]. The algebraic property we take advantage of in this construction is the commutativity of the RSA exponentiation function, which enables fast updates of cryptographic digests whenever partial information included in the digest changes, offering incrementality at no cost and at the same time allowing for succinct, constant-size proofs—notice that not all functions achieving incrementality at no cost [11] offer efficient proof complexity. The main results of this chapter are given in Theorems 3.2 and 3.4. A preliminary version of this chapter appears in [90]. In Chapter 4, we initially design a new authenticated data structure for a dynamic table with n entries. We present the first dynamic authenticated table that is update-optimal and is based on lattices, a mathematical object that found many applications in cryptography during the last decade. In particular, the update complexity of the authenticated table we design is O(1), improving in this way the “a priori” O(log n) update bounds of previous constructions, such as the Merkle tree. To achieve this result, we establish and exploit a property that we call repeated linearity of lattice-based hash functions. We secondly observe 6 that the repeated linearity of the used lattice-based cryptographic primitive lends itself to a natural notion of parallelism: As such, we describe parallel versions of our authenticated data structure algorithms, yielding the first parallel online memory checker [15] with O(1) query complexity using O(log n) checkers in the CREW model and without using a secret key setting, i.e., there is only need for small reliable but not secret memory. Theorem 4.4 describes the basic (parallel) authenticated data structure and Theorem 4.6 gives the application of the presented lattice-based authenticated table in parallel online memory checking. A preliminary version of this chapter appears in [93]. In Chapter 5, we study the problem of verifying outsourced set operations over a dynamic collection of sets. Based on the convenient use of the bilinear-map primitive, which proved to be a very useful tool in cryptography after its first appearance in the literature within a cryptographic context [60], we are able to construct the first operation-sensitive scheme for verifying set operations, such as union and intersection (see Theorem 5.1): Operationsensitivity is a strong property that enables us to achieve verification costs (proof and verification complexity) proportional to the size of the answer, and not to the time taken to produce the answer, a property that could not be achieved otherwise (e.g., with traditional hash-based techniques), and is obviously highly desirable. In this chapter, we also address applications of our techniques for verification of keyword searches on outsourced document collections (e.g., inverted-index queries) and queries in outsourced databases (e.g., equi-join queries). Since set intersection is heavily used in these applications, we obtain new authenticated data structures that compare favorably to existing approaches. This chapter closes an open problem, that of operation-sensitivity of set operations, posed in [33]. A preliminary version of this chapter appears in [94]. In Chapter 6, we observe that no optimal authenticated data structure (i.e., an authenticated data structure that adds no extra asymptotic overhead to the respective plain data 7 structure) is known to date1 . However, by assuming the existence of multilinear form generators [91], a cryptographic primitive proposed by Silverberg and Boneh in 2003 [19], the construction of which, however, remains an open problem, we introduce the first optimal authenticated dictionary data structure. The presented authenticated dictionary, described in Theorem 6.1, enjoys proofs of constant size, i.e., asymptotically equal to the size of the answer. We close this chapter with Theorem 6.2, showing a reduction connecting the existence of optimal authenticated dictionaries with the existence of multilinear form generators. A preliminary version of this chapter appears in [91]. We conclude in Chapter 7 by commenting on some open problems and future work. 1 Note that the lattice-based authenticated structure of Chapter 4 is only update-optimal whereas the authenticated data structure of Chapter 5 achieves optimal verification costs only. Chapter 2 Preliminaries and related work This chapter presents preliminary definitions and results that we are using in the rest of the thesis as well as an extended study of authenticated data structures related work. 2.1 Definitions In the remainder of the thesis, we denote with k ∈ N the security parameter. We begin with the definition of a negligible function, extensively used to express our security arguments: Definition 2.1 (Negligible function) Let f : N → R. We say that f (k) is neg(k) iff for any nonzero polynomial p(k) there exits N such that for all k > N it is f (k) < 1/p(k). Typical examples of functions that are neg(k) are the functions 2.1.1 1 2k and poly(k) . 2k Data structures and authenticated data structures To formally describe our authenticated data structures solutions in Chapters 3, 4, 5 and 6, we give definitions for a data structure scheme and the respective authenticated data structure scheme. Similar definitions have appeared in the work of Tamassia and Triandopoulos [107]. To avoid unnecessary complications, we do not define an abstract data type and we instead 8 9 define a data structure scheme directly. We use the notation {O1 , O2 , . . . , Oo } ← alg(I1 , I2 , . . . , Ii ) , to denote that algorithm alg has i inputs I1 , I2 , . . . , Ii and o outputs O1 , O2 , . . . , Oo . Whenever an input I or an output O appears as (I)∗ or (O)∗ (e.g., algorithms update() and verify() in Definition 2.3), this means that I or O are not required as inputs or outputs of the algorithm but might appear depending on the implemented scheme. Definition 2.2 (Data structure scheme) Let D be any data structure supporting a set of updates U and a set of queries Q. Denote with Dh the state of the data structure D at time h, where h ≥ 0 is an integer and D0 is the initial state of the data structure D. A data structure scheme D(U, Q) is a collection of the following three polynomial-time algorithms {update, query, check}: 1. Dh+1 ← update(u, Dh ): On input an update u ∈ U and the data structure Dh , this algorithm outputs the updated data structure Dh+1 ; 2. α(q) ← query(q, Dh ): On input a query q ∈ Q and the data structure Dh , this algorithm outputs the answer α(q) to query q; 3. {accept, reject} ← check(q, α, Dh ): On input a query q ∈ Q, an answer α and the data structure Dh , this algorithm outputs accept if α is a correct answer for query q on data structure Dh . Else it outputs reject. For example, consider a data structure scheme for the dictionary data structure (see Section 6), implemented with a red-black tree [29]. The query() algorithm performs a binary search while the update() algorithm performs the relevant rotations needed for re-balancing the structure. We note here that in Definition 2.2, script letter D in the notation D(U, Q) implies that D(U, Q) is a data structure scheme for data structure D (non-script letter) which (the data structure D) supports the set updates U and the set of queries Q. 10 Definition 2.3 (Authenticated data structure scheme) Let D(U, Q) be a data structure scheme defined by the collection of algorithms {update, query, check}. An authenticated data structure scheme A for the data structure scheme D(U, Q) is a collection of the following six polynomial-time algorithms {genkey, setup, update, refresh, query, verify}: 1. {sk, pk} ← genkey(1k ): This algorithm outputs the secret key sk and the public key pk, given the security parameter k; 2. {auth(D0 ), d0 } ← setup(D0 , sk, pk): This algorithm computes the authenticated data structure auth(D0 ) and the respective digest d0 of auth(D0 ), given a data structure D0 , the secret key sk and the public key pk;1 3. {Dh+1 , (auth(Dh+1 ))∗ , dh+1 , upd} ← update(u, Dh , (auth(Dh ))∗ , dh , sk, pk): This algorithm takes as input an update u ∈ U, a data structure Dh , possibly an authenticated data structure auth(Dh ), the digest dh of auth(Dh ) and both the secret and the public keys sk and pk. It outputs the data structure Dh+1 ← update(u, Dh ), possibly the authenticated data structure auth(Dh+1 ), the digest dh+1 of auth(Dh+1 ) and some relative information upd;2 4. {Dh+1 , auth(Dh+1 ), dh+1 } ← refresh(u, Dh , auth(Dh ), dh , upd, pk): This algorithm takes as input an update u ∈ U, a data structure Dh , an authenticated data structure auth(Dh ), the digest dh of auth(Dh ), the information upd computed by algorithm update() and only the public key pk. It outputs the data structure Dh+1 ← update(u, Dh ), the authenticated data structure auth(Dh+1 ) and the digest dh+1 of auth(Dh+1 );3 1 The digest d0 of the authenticated data structure auth(D0 ) is a collision-resistant representation of D0 , e.g., the roothash of a Merkle tree [77]. It is usually of constant size. 2 Note that this algorithm is only required to output the new digest dh+1 and the new data structure Dh+1 . Outputting the new authenticated data structure auth(Dh+1 ) is not a requirement of the algorithm—this will be important in improving the complexity of this algorithm is some schemes. Also, the secret key is required for execution. 3 Note here that the secret key is not used for execution. However, for correct inputs, the output digest dh+1 is the same as in algorithm update(). 11 5. {Π(q), α(q)} ← query(q, Dh , auth(Dh ), pk): On input a query q ∈ Q, a data structure Dh , an authenticated data structure auth(Dh ) and the public key pk, this algorithm returns the answer α(q) ← query(q, Dh ) to the query q, along with a respective proof Π(q); 6. {accept, reject} ← verify(q, α, Π, dh , (sk)∗ , pk): This algorithm takes as input a query q ∈ Q, an answer α, a proof Π, the digest dh of auth(Dh ), possibly the secret key sk and the public key pk and outputs either accept or reject. There are two properties that an authenticated data structure scheme should satisfy, i.e., correctness and security (intuition follows from signature schemes definitions, e.g., see Camenisch and Lysyanskaya [24]): Roughly speaking, the correctness property requires that, for every query q ∈ Q, if a proof Π(q) is computed by algorithm query() (i.e., faithfully), then verify(), on input Π(q) and a correct answer α(q), should always accept, as long as the digest d is updated through algorithm refresh() (i.e., it is the correct one). The security property requires that a computationally-bounded adversary, i.e., an adversary that has access to polynomially-bounded resources (time and space) in the security parameter k, should not be able (except with negligible probability) to produce verifying proofs Π for incorrect answers α corresponding to queries q ∈ Q on an authenticated data structure auth(D) whose digest is updated through adversarially chosen oracle calls to algorithm update()—this is why we require update() have access to the secret key. Definition 2.4 (Correctness of authenticated data structure scheme) Let D(U, Q) be a data structure scheme defined by the collection of algorithms {update, query, check} and let also A be an authenticated data structure scheme for D(U, Q) defined by the collection of algorithms {genkey, setup, update, refresh, query, verify}. We say that the authenticated data structure scheme A is correct if, for all k ∈ N, for all {sk, pk} output by algorithm genkey(), for all Dh , auth(Dh ), dh output by one invocation of algorithm setup() followed by polynomially-many invocations of algorithm refresh(), where h ≥ 0, for all 12 queries q ∈ Q and for all Π(q), α(q) output by algorithm query(q, Dh , auth(Dh ), pk), with all but negligible probability, whenever algorithm check(q, α(q), Dh ) accepts, so does algorithm verify(q, Π(q), α(q), dh , (sk)∗ , pk). Definition 2.5 (Security of authenticated data structure scheme) Let D(U, Q) be a data structure scheme defined by the collection of algorithms {update, query, check}, A be an authenticated data structure scheme for D(U, Q) defined by the collection of algorithms {genkey, setup, update, refresh, query, verify}, k be the security parameter and {sk, pk} ← genkey(1k ). Denote with Adv a polynomially-bounded adversary that is only given the public key pk. The adversary has unlimited access to all algorithms of A, except for algorithms setup(), update() and possibly algorithm verify(), to which he has only oracle access. The adversary picks an initial state of the data structure D0 and computes auth(D0 ), d0 through oracle access to algorithm setup(). Then, for i = 0, . . . , h = poly(k), Adv issues an update ui ∈ U in the data structure Di and outputs Di+1 , possibly the authenticated data structure auth(Di+1 ) and di+1 through oracle access to algorithm update(). Finally the adversary enters the attack stage where he picks an index 0 ≤ t ≤ h + 1, a query q ∈ Q, an answer α and a proof Π. We say that the authenticated data structure scheme A is secure if for all k ∈ N, for all {sk, pk} output by algorithm genkey(), and for all polynomially-bounded adversaries Adv it is   k ∗  {q, Π, α, t} ← Adv(1 , pk); accept ← verify(q, α, Π, dt , (sk) , pk);  Pr   ≤ ν(k) , reject ← check(q, α, Dt ). where ν(k) is neg(k). 2.1.2 Complexity model To explicitly measure complexity of various algorithms with respect to the number of primitive cryptographic operations, without considering the dependency on the security parameter, we adopt the complexity model used in memory checking [15, 35], which has been only implicitly used in the literature of authenticated data structures: 13 Definition 2.6 (Access complexity) The access complexity of an algorithm is defined as the number of memory accesses this algorithm performs on the (authenticated) data structure stored in an indexed memory of n cells, in order for the algorithm to complete its execution. Here, “access complexity” is used instead of “query complexity” used in memory checking [15, 35] to avoid ambiguity when referring to algorithm query() of the authenticated data structure scheme. We also require that each memory cell can store up to O(poly(log n)) bits, a word size used in Blum’s original memory checking work [15] but also in subsequent work [35]. For example, a Merkle tree [77] has O(log n) update access complexity since the update algorithm needs to read and write O(log n) memory cells of the authenticated data structure, each cell storing exactly one hash value. Definition 2.7 (Group complexity) The group complexity of a data collection (e.g., proof group complexity or authenticated data structure group complexity) is defined as the number of elementary data objects (e.g., hash values or elements in Zp ) contained in that object. Whenever it is clear from the context, we omit the terms “access” and “group”. Also, concerning the above definitions, we note that since the size of the problem n, in a cryptographic setting, has to be polynomially-bounded by the security parameter k, i.e., n = o(2k ), the O(.) notation appearing in the following chapters expresses asymptotic results for values of n → 2k , and not for values of n → ∞, as is mathematically implied by the O(.) notation [29]. Second, we observe that access complexity captures the notion of “time” and group complexity captures the notion of “space”. However, access and group complexities are in principle smaller than time and space complexities. This is because time and space complexities are counting number of bits and are always functions of the security parameter k. Being in a cryptographic setting however, the security parameter is always Ω(log n) and therefore time and space complexities are always Ω(log n)—however, access and group complexities can be O(1). 14 2.1.3 Optimality and public verifiability We now give the definition of an optimal authenticated data structure scheme. Intuitively, an optimal authenticated data structure scheme A for a data structure scheme D(U, Q) should not add any asymptotic overhead to the complexity of the algorithms of the data structure scheme D(U, Q): Definition 2.8 (Optimal authenticated data structure scheme) Suppose D(U, Q) is a data structure scheme defined by the collection of algorithms {update, query, check} and A is a correct and secure authenticated data structure scheme for D(U, Q) defined by the collection of algorithms {genkey, setup, update, refresh, query, verify}. The authenticated data structure scheme A is optimal if all of the following are true: 1. (Space optimality) For all integers h ≥ 0, the group complexity of the authenticated data structure auth(Dh ) is no more than the group complexity of the data structure Dh , i.e., |auth(Dh )| = O(|Dh |) ; 2. (Update optimality) For all updates u ∈ U, the sum of the access complexity of update() plus the group complexity of upd (output by update()) plus the access complexity of refresh() is no more than the access complexity of update(), i.e., |update()| + |upd| + |refresh()| = O(|update()|) ; 3. (Query optimality) For all queries q ∈ Q, the access complexity of query() is no more than the access complexity of query(), i.e., |query()| = O(|query()|) ; 4. (Proof and verification optimality) Let α(q) and Π(q) be the answer and the proof output by query(), on input a query q ∈ Q. Then for all queries q ∈ Q: 15 • The group complexity of the proof Π(q) is no more than the group complexity of the query q plus the group complexity of the answer α(q), i.e., |Π(q)| = O(|q| + |α(q)|) ; • The access complexity of verify() is no more than the group complexity of the query q plus the group complexity of the answer α(q), i.e., |verify()| = O(|q| + |α(q)|) . We note here that constructing an optimal authenticated data structure scheme appears to be a difficult task. In Chapter 5 we construct an authenticated data structure scheme that is almost optimal (update costs are increased by a logarithmic factor), whereas in Chapter 6 we construct an optimal authenticated data structure scheme for a dictionary data structure that is however based on a cryptographic primitive that is not known to exist yet. Definition 2.9 (Publicly-verifiable authenticated data structure scheme) Suppose D(U, Q) is a data structure scheme and A is a authenticated data structure scheme for D(U, Q) defined by the collection of algorithms {genkey, setup, update, refresh, query, verify}. Let also k be the security parameter and {sk, pk} ← genkey(1k ). The authenticated data structure scheme A is publicly-verifiable if algorithm verify() does not require the secret key sk as an input. 2.2 Protocols and applications Let A be a (publicly-verifiable) authenticated data structure scheme for a data structure scheme D(U, Q) and let also Q be a correct and secure public-key signature scheme (e.g., see Camenisch and Lysyanskaya [24]). We now describe how an authenticated data structure scheme A may be used in various protocols, such as a three-party authenticated data 16 structures protocol (e.g., see Tamassia [105] and Martel et al. [75]) or a two-party authenticated data structures protocol (e.g., see Papamanthou and Tamassia [92]). For the protocols description we use the following notation: • [R]t : program. Program program is executed by party R at time t; • [R → G]t : data. Data data is sent from party R to party G at time t. 2.2.1 Three-party authenticated data structures protocol A typical setting where an authenticated data structure scheme can be employed involves three participating entities, usually referred to as three-party model [105] (see Figure 2.1): A trusted party called source, owns a data structure, that is replicated along with some cryptographic information to one or more untrusted parties, called servers. Clients issue data structure queries to the servers and wish to publicly verify the answers received by the servers, based only on the trust they have in the source. This trust is conveyed through a time-stamped signature on the digest of the data structure (a collision resistant succinct representation of the data structure, e.g., the roothash of a Merkle tree—see Definition 2.3), that is made available by the source. During an update, the source needs just to compute q u,upd Π(q) source setup() update() client server refresh() query() verify() Figure 2.1: The three-party authenticated data structures protocol. During the update phase, the source sends an update u ∈ U to the server along with the respective update information upd output by update(). During the query phase, the client sends a query q ∈ Q to the server and the server runs algorithm query() to output the proof Π(q) for the respective answer. the new digest, whereas the server needs to update the authenticated data structure as a 17 whole. We describe this protocol formally below: Protocol 2.1 A three-party authenticated data structures protocol involves three participating entities: A trusted source, T , that has access both to the public and the secret keys, an untrusted server, S, that has access only to the public keys, and a client, C, that has access only to the public keys. Let k be the security parameter, A = {genkey, setup, update, refresh, query, verify} be a correct and secure publicly-verifiable authenticated data structure scheme for a data structure scheme D(U, Q) and Q = {GEN KEY, SIGN , VERIFY} be a correct and secure public-key signature scheme. The protocol has four phases: 1. Setup phase: Source T owns the data structure D (described by data structure scheme D(U, Q)) at time t0 . The setup phase consists of the following steps: (a) [T ]t0 : {SK, PK} ← GEN KEY(1k ); ( signature scheme keys generation) (b) [T ]t0 : {sk, pk} ← genkey(1k ); ( authenticated data structure scheme keys generation) (c) [T ]t0 : {auth(D), d} ← setup(D, sk, pk); ( computing the authenticated data structure) (d) [T ]t0 : Store D, auth(D), d; (source stores necessary information) (e) [T → S]t0 : D, auth(D), d; ( outsourcing) (f ) [S]t0 : Store D, auth(D), d. (server stores necessary information) 2. Update phase: Let u ∈ U be an update issued by source T on data structure D at time tτ . Let D0 be the updated data structure. The update phase consists of the the following steps: (a) [T ]tτ : {D0 , (auth(D0 ))∗ , d0 , upd} ← update(u, D, (auth(D))∗ , d, sk, pk); D = D0 , (auth(D))∗ = (auth(D0 ))∗ , d = d0 ; ( new digest generation) (b) [T → S]tτ : u, upd; ( sending relative update information) 18 (c) [S]tτ : {D0 , auth(D0 ), d0 } ← refresh(u, D, auth(D), d, upd, pk); D = D0 , auth(D) = auth(D0 ), d = d0 . ( authenticated data structure update at the server) 3. Periodical signing: Let tκ = κ∆t for κ = 0, 1, 2, . . . be certain timestamps in time (with time difference ∆t), where t0 is defined as the time of algorithm setup() execution (see Item 1c). The periodical signing is performed independently from all the other phases: (a) [T ]tκ for κ = 0, 1, 2, . . .: sgnκ ← SIGN (SK, d||tκ ); ( signature computation on most recent digest d) (b) [T → S]tκ for κ = 0, 1, 2, . . .: sgnκ , tκ ; ( sending signature and timestamp) (c) [S]tκ for κ = 0, 1, 2, . . .: sgn = sgnκ and t = tκ . ( storing most recent signature and timestamp) 4. Query phase: The query phase proceeds as follows, between the client C and the server S. Let T be the time of the query q ∈ Q: (a) [C → S]T : q; ( sending the query) (b) [S]T : {Π(q), α(q)} ← query(q, D, auth(D), pk); ( answer and proof computation) (c) [C ← S]T : α(q), Π(q), d, t, sgn; ( sending answer, proof, most recent digest, most recent timestamp and signature on the last two by the source) (d) [C]T : Output ACCEPT if and only if: ( verification of answer) i. T − t < ∆t; ( t is the most recent timestamp) ii. ACCEPT ← VERIFY(sgn, d||t, PK); ( d is the most recent digest); iii. accept ← verify(q, α(q), Π(q), d, pk). else output REJECT. We can now state the main theorem of this section: Theorem 2.1 Let A = {genkey, setup, update, refresh, query, verify} be a correct and secure publicly-verifiable authenticated data structure scheme for a data structure scheme D(U, Q) 19 and k be the security parameter. Let g(n), s(n), u(n), r(n), q(n) and v(n) be the access complexities of algorithms genkey, setup, update, refresh, query, verify respectively. Let also i(n), p(n) and α(n) be the group complexity of the information upd (output by algorithm update()), of the proof Π(q) (output by algorithm query()) and of the answer α(q) (output by algorithm query()) respectively and f (n) be the group complexity of the authenticated data structure auth(D) (output by algorithm setup()). Then there exists a three-party authenticated data structures protocol involving a trusted source T , an untrusted server S and a client C for verifying queries q ∈ Q and at the same time supporting updates u ∈ U such that: 1. The setup at the source T has O(s(n)) access complexity; 2. The update at the source T has O(u(n)) access complexity; 3. The space needed at the source T has O(f (n)) group complexity; 4. The communication between the source T and the server S has O(i(n)) group complexity; 5. The update at the server S has O(r(n)) access complexity; 6. The query at the server S has O(q(n)) access complexity; 7. The space needed at the server S has O(f (n)) group complexity; 8. The communication between the server S and the client C has O(p(n) + α(n)) group complexity; 9. The verification at the client C has O(v(n)) access complexity; 10. For a query q ∈ Q sent by the client C to the server S at any time (even after updates), let α be an answer and let π be a proof returned by the server S. With probability Ω(1 − neg(k)), the client C accepts the answer α if and only if α is correct. 20 Proof: (Complexity) The protocol in question is Protocol 2.1. In the setup phase the access complexity of Item 1a and Item 1b is O(1) (they are not accessing the data structure at all). The access complexity of Item 1c is O(s(n)) since it involves one call to algorithm setup(). Therefore the total setup access complexity at the source is O(s(n)). The update at the source involves one call to algorithm update() (see Item 2a), therefore it has O(u(n)) access complexity. Both the source and the server store the authenticated data structure auth(D), therefore their space has O(f (n)) group complexity. The communication between the source and the server has O(i(n)) group complexity, since information u and upd are sent (see Item 2b) and also periodically (i.e., every ∆t time units) a signature on the timestamped digest is sent (see Item 3c), which has O(1) group complexity. The update at the server has O(r(n)) access complexity since it involves one call to algorithm refresh() (see Item 2c). Computing a proof at the server (i.e., query at the server) has O(q(n)) access complexity since it involves one call to algorithm query() (see Item 4b). The communication between the server and the client has O(p(n) + α(n)) group complexity since it involves sending off the proof and the answer, a digest, a signature and a timestamp (see Item 4c). Finally the verification at the client has O(v(n)) access complexity since it involves checking the Relations 4(d)i (O(1) complexity), 4(d)ii (O(1) complexity) and 4(d)iii (O(v(n)) complexity since it involves one call to algorithm verify()). (Security) Let now T be the time of a query q ∈ Q. The client sends query q to the server (Item 4a). The server replies with an answer α, a proof π, a digest d, a timestamp t, such that T − t < ∆t, and a signature sgn (Item 4c), computed by the trusted source (Item 3a). If server S follows the protocol, it will output a correct answer α and a correct digest d (output by refresh()). Since a correct signature scheme and a correct authenticated data structure scheme are used, it follows that the client will accept with all but negligible probability (see Definition 2.4). Suppose now that the server does not follow the protocol and that the client accepts α, while α is an incorrect answer. In this case, the following three events have to be true: 21 • E1 : ACCEPT ← VERIFY(sgn, d||t, PK). This event can be partitioned into the following events: 1. E10 : Digest d is not the correct digest, i.e., the one signed by the source at time t; 2. E11 : Digest d is the correct digest, i.e., the one signed by the source at time t; • E2 : accept ← verify(q, α, π, d, pk); • F: α is an incorrect answer to query q. Therefore the probability in question is the probability Pr[E1 ∩ E2 ∩ F]. By using some probability calculus this is bounded by Pr[E2 |E11 ∩ F] + Pr[E10 ] = P1 + P2 . However, since a secure authenticated data structure scheme A is used, P1 is neg(k). Finally, since a secure (i.e., unforgeable) signature scheme is used, P2 is also neg(k). This concludes the security proof for the three-party protocol. 2 2.2.2 Two-party authenticated data structures protocol We now continue with the description of the two-party authenticated data structures protocol. This protocol involves a trusted client and an untrusted server (see Figure 2.2). The client can store constant space and cannot usually perform expensive computations (e.g., iPhone). This model is close to the model of outsourced verifiable computation [5, 28, 41], which has recently appeared in the literature. The main differences with the three-party protocol are the following: 1. The client performs the updates, sends the queries to the server and verifies the answers; 2. The client stores some state of constant size only and cannot store the authenticated data structure; 22 q/u Π(q) / Π(Qu) verify() client verify() server refresh() query() setup() update() Figure 2.2: The two-party authenticated data structures protocol. During the query phase, the client sends a query q ∈ Q to the server and the server runs algorithm query() to output the proof Π(q) for the answer. During the update phase, the client sends to the server an update u ∈ U, which relates to a certain set of queries Qu ⊆ Q. Then the server computes the set of proofs Π(Qu ). This set of proofs will be used by function z(.) of Assumption 2.1, which will output δu (Dh ) and δu (auth(Dh )), which are subsequently input to algorithm update(). 3. The authenticated data structure scheme used need not be publicly-verifiable. Before we continue with the description of the protocol, we give a necessary definition, inspired by the definition of t-heavy locations in the work on memory checking lower bounds of Dwork et al. [35]. This definition will allow us to formally characterize the parts of the authenticated data structure that are accessed during an update. This characterization will be important for formalizing the two-party protocol: Definition 2.10 (Heavy locations of an update) Let A be an authenticated data structure scheme for a data structure scheme D(U, Q) defined by the collection of algorithms {genkey, setup, update, refresh, query, verify}, k be the security parameter and {sk, pk} ← genkey(1k ). Let u ∈ U be an update that updates the data structure from D to D0 and the authenticated data structure from auth(D) to auth(D0 ). We denote with δu (D) ⊆ D and δu (auth(D)) ⊆ auth(D) the set of memory locations of D and auth(D) respectively that are accessed by algorithm update() during update u. Analogously, we denote with δu (D0 ) ⊆ D0 and δu (auth(D0 )) ⊆ auth(D0 ) the set of memory locations of D0 and auth(D0 ) respectively that 23 are altered by algorithm update() during update u. Namely, we have the following equivalence {D0 , auth(D0 ), d0 , upd} ← update(u, D, auth(D), d, sk, pk) ⇔ {δu (D0 ), δu (auth(D0 )), d0 , upd} ← update(u, δu (D), δu (auth(D)), d, sk, pk) . The above definition implies that, in order for update() to execute, it needs to have access only to the parts of the (authenticated) data structure required for execution, and not to the whole (authenticated) data structure. This is essential for the two-party model, since the client does cannot store all the data locally. An assumption for formalizing the two-party protocol Let A = {genkey, setup, update, refresh, query, verify} be an authenticated data structure scheme for a data structure scheme D(U, Q). Let Q0 ⊆ Q be a set of queries and Π(Q0 ) and α(Q0 ) be the sets of proofs and answers respectively output by query(q 0 , Dh , auth(Dh ), pk) for all queries q 0 ∈ Q0 . We make the following assumption in order to achieve a generalized application of an authenticated data structure scheme in the two-party protocol: For every update u ∈ U, there exists a set of data structure queries Qu ⊆ Q such that the set of memory locations of Dh and the set of memory locations of auth(Dh ) accessed by algorithm update() during update u (i.e., δu (Dh ) and δu (auth(Dh ))) is a function of Qu , α(Qu ), Π(Qu ). We state this formally now: Assumption 2.1 (Update query) Let A = {genkey, setup, update, refresh, query, verify} be an authenticated data structure scheme for data structure scheme D(U, Q). For every update u ∈ U on data structure Dh , there exists a set of queries Qu ⊆ Q such that if {Π(Qu ), α(Qu )} ← query(Qu , Dh , auth(Dh ), pk), then it is {δu (auth(Dh )), δu (Dh )} = z(Qu , α(Qu ), Π(Qu )) , for some well-defined function z(.), computable with complexity O(|Qu |v(n)), where v(n) is the access complexity of verify(). 24 The following example gives some intuition about Assumption 2.1. Example 2.1 Let us consider the case of a Merkle tree [77] that is used to verify the contents of an n-index array A. Let also u be the update “set A[i] = x” and let A[i] = y before the update u takes place. The respective set of queries Qu consists of one query qu , namely the query “return the contents of cell i”. Let Π(qu ) be a verifying proof (a logarithmicsized chain of hash values) and let α(qu ) be the correct answer, i.e., α(qu ) = y. Note that δu (Dh ) = A[i] = y and δu (auth(Dh )) is the path of hash values that is computed during the verification. Namely, function z(.) in this case is the Merkle tree verification algorithm. Moreover, z(.) has O(log n) complexity, equal to the complexity of the verification algorithm. Thus we conclude that for the Merkle tree authenticated data structure (as well as for the similar authenticated data structure scheme presented in Chapter 4), Assumption 2.1 is true and therefore we can implement the authenticated data structure scheme in the two-party model in a generic way (a similar implementation for skip lists is described in [92]). Protocol 2.2 A two-party authenticated data structures protocol involves two participating entities: A trusted client C (that has access both to the public and the secret keys) and an untrusted server S (that has access only to the public keys). Let k be the security parameter and A = {genkey, setup, update, refresh, query, verify} be a correct and secure authenticated data structure scheme for a data structure scheme D(U, Q). The protocol has four phases: 1. Setup phase: Client C owns the data structure D at time t0 . The setup phase consists of the following steps: (a) [C]t0 : {sk, pk} ← genkey(1k ); ( authenticated data structure scheme keys generation) (b) [C]t0 : {auth(D), d} ← setup(D, sk, pk); ( authenticated data structure generation) (c) [C → S]t0 : D, auth(D), d; ( outsourcing) 25 (d) [C]t0 : Delete D and auth(D). Store d. ( storing necessary information only) 2. Update phase: Let u ∈ U be any update issued by client C on data structure D at time tτ . Let D0 be the updated data structure. The update phase consists of the the following steps: (a) [C → S]tτ : u; ( sending the update) (b) [S]tτ : {Π(Qu ), α(Qu )} ← query(Qu , Dh , auth(Dh ), pk) ( computing proofs and answers for the set of queries Qu related to update u—see Assumption 2.1) (c) [C ← S]tτ : Π(Qu ), α(Qu ); ( sending the data required for the update) (d) [C]tτ : Execute the following steps: i. If reject ← verify(Qu , α(Qu ), Π(qu ), d, (sk)∗ , pk), output REJECT; ii. {δu (auth(Dh )), δu (Dh )} = z(Qu , α(Qu ), Π(Qu ); (use of function z(.), see Assumption 2.1) iii. {δu (D0 ), (δu (auth(D0 )))∗ , d0 , upd} ← update(u, δu (D), (δu (auth(D)))∗ , d, sk, pk); d = d0 ; ( generating and storing the new digest by running update() only on the heavy locations of the update—see Definition 2.10) iv. If upd 6= Ø go to Item 2e; Else go to Item 2f; ( not always need for a second round) (e) [C → S]tτ : upd; ( sending relative update information) (f ) [S]tτ : {D0 , auth(D0 ), d0 } ← refresh(u, D, auth(D), d, upd, pk); D = D0 , auth(D) = auth(D0 ), d = d0 . ( authenticated data structure update) 3. Query phase: The query phase proceeds as follows, between the client C and the server S. Let T be the time of the query q ∈ Q: (a) [C → S]T : q; ( sending the query) (b) [S]T : {Π(q), α(q)} ← query(q, D, auth(D), pk); ( answer and proof computation) 26 (c) [C ← S]T : α(q), Π(q); ( sending answer and proof) (d) [C]T : Output ACCEPT if and only if accept ← verify(q, α(q), Π(q), d, (sk)∗ , pk) else output REJECT. ( verification of answer) Theorem 2.2 Let A = {genkey, setup, update, refresh, query, verify} be a correct and secure authenticated data structure scheme (not necessarily publicly-verifiable) for a data structure scheme D(U, Q) and k be the security parameter. Suppose Assumption 2.1 holds for A, such that for each update u ∈ U there exists a respective set of queries Qu , as defined in Assumption 2.1. Let g(n), s(n), u(n), r(n), q(n) and v(n) be the access complexities of algorithms genkey, setup, update, refresh, query, verify respectively. Let also i(n), p(n) and α(n) be the group complexity of the information upd (output by algorithm update()), of the proof Π(q) (output by algorithm query()) and of the answer α(q) (output by algorithm query()) respectively and f (n) be the group complexity of the authenticated data structure auth(D) (output by algorithm setup()). Then there exists a two-party authenticated data structures protocol involving a trusted client C and an untrusted server S for verifying queries q ∈ Q and at the same time supporting updates u ∈ U such that: 1. The protocol is non-interactive if the information upd output by update() is empty; Otherwise it requires one round of interaction; 2. The setup at the client C has O(s(n)) access complexity; 3. The update at the client C has O(|Qu |v(n) + u(n)) access complexity; 4. The verification at the client C has O(v(n)) access complexity; 5. The space needed at the client C has O(1) group complexity; 6. The communication between the client C and the server S has O(|Qu |(p(n) + α(n)) + i(n)) group complexity during updates and O(p(n) + α(n)) group complexity during queries; 27 7. The update at the server S has O(|Qu |q(n) + r(n)) access complexity; 8. The query at the server S has O(q(n)) access complexity; 9. The space needed at the server S has O(f (n)) group complexity; 10. For a query q ∈ Q sent by the client C to the server S at any time (even after updates), let α be an answer and let π be a proof returned by the server S. With probability Ω(1 − neg(k)), the client C accepts the answer α if and only if α is correct. Proof: Note that Item 2e, which introduces one round of interaction, is executed only when upd = Ø. Therefore Item 1 of Theorem 2.2 holds. (Complexity) The protocol in question is Protocol 2.2. In the setup phase the access complexity of Item 1a is O(1) (it is not accessing the data structure at all). The access complexity of Item 1b is O(s(n)) since it involves one call to algorithm setup(). Therefore the total setup access complexity at the client is O(s(n)). The update at the client involves one verification of α(Qu ) (see Item 2(d)i, which has O(|Qu |v(n)) complexity), the application of function z(.) to output δu (Dh ) and δu (auth(Dh )) respectively (see Item 2(d)ii)—this, by Assumption 2.1 has O(|Qu |v(n)) complexity— and one call to algorithm update() (see Item 2(d)iii). Therefore the total complexity is O(|Qu |v(n) + u(n)). The server stores the authenticated data structure auth(D), therefore its space has O(f (n)) group complexity. The client stores only the digest d, therefore it needs space of O(1) group complexity. The communication between the client and the server has • O(|Qu |(p(n)+α(n))+i(n)) group complexity during an update u, since Π(Qu ) (Item 2c), α(Qu ) (Item 2c) and information upd (Item 2e) are exchanged between the two parties; • O(p(n) + α(n)) group complexity during queries since it involves sending off a proof and an answer for one query (see Item 4c). The update at the server has O(|Qu |q(n)+r(n)) access complexity since it involves computing α(Qu ) and Π(qu ) (Item 2b) and one call to algorithm refresh() (see Item 2f). Computing a 28 proof at the server (i.e., query at the server) has O(q(n)) access complexity since it involves one call to algorithm query() (see Item 3b). Finally the verification at the client has O(v(n)) access complexity since it involves one call to algorithm verify() (Item 3d), which has O(v(n)) complexity. (Security) Let now T be the time of a query q ∈ Q. The client sends query q to the server (Item 3a). The server replies with an answer α and a proof π (Item 3b). If server S follows the protocol, it will output a correct answer α. Since a correct authenticated data structure scheme is used, it follows that the client will accept with all but negligible probability (see Definition 2.4). Suppose now that the server does not follow the protocol and that the client accepts α, while α is an incorrect answer. Let E be that event. Note that E can be written as E = E ∩ [(digest d is correct) ∪ (digest d is not correct)] . Therefore Pr[E] ≤ Pr[E ∩ (digest d is correct)] + Pr[E ∩ (digest d is not correct)] ≤ Pr[E ∩ (digest d is correct)] + Pr[digest d is not correct] ≤ Pr[E|(digest d is correct)] + Pr[digest d is not correct] = P 1 + P2 . However, since a secure authenticated data structure scheme A is used, P1 is neg(k). Also by Lemma 2.1, P2 is also neg(k). This completes the proof. 2 Lemma 2.1 Let k be the security parameter. The digest d used by every verification at Item 3d of Protocol 2.2 is correct with probability Ω(1 − neg(k)), even in the presence of updates. Proof: Let ui , for some i ≥ 0, be the first update where the protocol rejects at Item 2(d)i (if there is such an update). We prove the lemma by induction on the number of updates before 29 update ui . Before the execution of the first update u0 issued by the client, the lemma holds since the digest d0 is output by setup() in Item 1b, run by the trusted client. Therefore d0 is correct with probability 1. Suppose the lemma holds for the time period before the execution of the update ui−1 at Item 2(d)iii, namely the digest di−1 , used by every verification before update ui−1 , is correct with probability Ω(1−neg(k)). Let α(Qui−1 ) and Π(Qui−1 ) be the answers and the proofs related to the update ui−1 , derived from Assumption 2.1. Since the protocol does not reject during update ui−1 , the verification of α(Qui−1 ) and Π(Qui−1 ) at Item 2(d)i is successful. Therefore, since di−1 is correct with probability Ω(1 − neg(k)) (by inductive hypothesis), the values α(Qui−1 ) and Π(Qui−1 ) are also correct with probability Ω(1 − neg(k)), since a secure authenticated data structure scheme is used (see Definition 2.5). This implies that δui−1 (auth(Di−1 )), δui−1 (Di−1 ), output by z(Qui−1 , α(Qui−1 ), Π(Qui−1 )) at Item 2(d)ii are correct as well with overwhelming probability. Therefore update() outputs the correct digest di with probability Ω(1 − neg(k)), since it executes on correct data δui−1 (auth(Di−1 )) and δui−1 (Di−1 ) at Item 2(d)iii. Thus the digest di , used by every verification before update ui , is correct with probability Ω(1 − neg(k)). This completes the proof. 2 2.3 Related work In this section we present a general literature review related to authenticated data structures [105] and more broadly to methods developed for checking the integrity of data and computations stored and executed by adversarial parties. In the description of the related work, we denote with n the size of the data structure (e.g., number of elements stored in the dictionary). Note that in the subsequent chapters of the thesis, there are references to literature work that is more related to the algorithmic or cryptographic problem that is studied in the specific chapter. 30 2.3.1 Generic collision-resistant hashing Since the appearance of Merkle’s seminal paper on hash trees [77], many works in authenticated data structures literature have used generic collision-resistant hashing to realize efficient integrity checking mechanisms. Generic collision-resistant hashing refers to the use of certain cryptographic hash functions, such a MD-5 and SHA-2, as a black box, i.e., without exploiting the internal (algebraic) structure of the algorithms implementing them. These constructions have been known to be very efficient in practice. However, due to the heuristic nature of their security arguments, various attacks on them (e.g., recent attack on SHA1 [111] and MD-5 [103]) have appeared over the years. Nevertheless, several constructions based on generic collision-resistant hashing have been developed for the verification of various queries on dynamic data structures. After Naor and Nissim dynamized Merkle’s solution [81], providing the first dynamic authenticated dictionary, Goodrich and Tamassia [48] presented another efficient realization of a dynamic authenticated dictionary with skip lists, which was enhanced with persistence by Anagnostopoulos et al. [4] and was subsequently tested by Goodrich et al. [50] and implemented in a two-party protocol by Papamanthou and Tamassia [92], finding many applications in authenticated file systems (e.g., see Goodrich et al. [46]) and authenticated outsourced storage (e.g., see Heitzmann et al. [56]). Martel et al. [75] presented several new authenticated data structures for more complicated computations (e.g., database queries), supporting efficient I/O algorithms as well. Distributed authenticated hash tables were introduced by Tamassia and Triandopoulos [104] and the verification of more complicated queries, such as connectivity queries on graphs and fractional cascading, is presented by Goodrich et al. [53]. All the above authenticated data structures achieve O(log n) complexity costs, which are shown to be optimal [106] (only for methods using generic collision-resistant hashing as a black box). The database community has also employed generic collision-resistant hashing extensively in order to verify various structures and queries related to systems and applications such as I/O-efficient search trees [66], queries on streams [67], database join queries [112], shortest 31 path computations [72] and set operations [33]. Other types of queries such as 2-dimensional range search have also been investigated [6]. 2.3.2 More advanced cryptography Although the majority of authenticated data structures developed in the literature are based on generic collision-resistant hashing, there have been some solutions for verifying queries in various settings using other cryptographic primitives, such as one-way accumulators. Oneway accumulators, that were introduced by Benaloh and de Mare [13], are based on the RSA exponentiation function and comprise an efficient way of securely compressing multiple inputs into one succinct representation, so that efficient proofs of membership can be computed. Implemented with an RSA accumulator, they satisfy quasi-commutativity, a useful property that common generic collision-resistant functions lack, which allows for efficient updates and convenient preprocessing. Refinements of the RSA accumulator are also given by Baric and Pfitzmann [10], where except for one-wayness, collision resistance is achieved, and also by Gennaro et al. [42] and by Sander et al. [101]. Dynamic accumulators (along with protocols for zero-knowledge proofs) were introduced by Camenisch and Lysyanskaya [23]. A first application of accumulators in the authenticated data structures model was made by Goodrich et al. [51]; in this work, and in favor of constant complexity proofs, general O(n ) bounds are derived for various complexity measures such as query and update complexity (as opposed to logarithmic bounds of methods using generic collision-resistant hashing). An authenticated data structure [52] that combines hierarchical hashing with the scheme of [51] appeared later, and a similar hybrid authentication scheme was developed by Nuckolls [84]. Accumulators using other cryptographic primitives (groups admitting bilinear pairings) the security of which is based on other assumptions (hardness of strong Diffie-Hellman problem) are presented by Nguyen [83] and Camenisch et al. [22]. However, updates in the work of Nguyen [83] are inefficient when the trapdoor information is not known: individual precomputed witnesses can be updated with constant complexity, thus incurring a linear total 32 cost for updating all the witnesses after an update in the set. Also, the accumulator by Camenisch et al. [22] requires space proportional to the number of elements ever accumulated in the set (book-keeping information of considerable size is needed), or otherwise important constraints on the range of the accumulated values are required. Efficient dynamic accumulators for non-membership proofs are presented by Li et al. [68]. Accumulators for batch updates are presented by Wang et al. [110] and accumulator-like expressions to authenticate static sets under the provable data possession model are presented by Ateniese et al. [7, 37]. The work by Sander et al. [100] studies efficient algorithms for accumulators with unknown trapdoor information. Finally in the work of lower bounds by Dwork et al. [35], and simultaneously with work performed by Papamanthou et al. [90], logarithmic lower bounds as well as constructions achieving query-update cost trade-offs have been studied in the memorychecking model. Tree hierarchies are authenticated (with access-control enabled) by Atallah et al. [6], and by using bilinear pairings. A study of an extensive suite of various authenticated data structures problems can be found in the PhD theses of Triandopoulos [108] and Crosby [30]. Applications of authenticated data structures in distributed systems integrity are presented in the PhD thesis of Maniatis [73]. 2.3.3 Relation to memory checking Authenticated data structures are closely related to the memory checking model, which was originally defined by Blum et al. [15] and has consequently been studied in several works [35, 82]. Both authenticated data structures and memory checking have as goal to verify the operation of some functionality that is offered by an untrusted and possibly malicious server, i.e., to design cryptographic protocols that can efficiently verify the correctness of the corresponding provided functionality. In memory checking, the functionality offered by a memory array is being verified, namely read and write operations in a one-dimensional table with indices 1, . . . , n. A read operation returns the value that is stored at a given index 33 j ∈ [1, n] and a write operation involves changing the content of a given index j ∈ [1, n] to a new given value (similar to the authenticated data structure introduced in Chapter 4); a memory-checking protocol verifies that a value that is read from any given index j is the last value that was written to that index j. The celebrating result by Blum et al. [15] states that this fundamental read-write functionality on n memory cells can be verified by reading O(log n) special values that are stored at some additional unreliable memory cells of total size O(n). In authenticated data structures, the functionality offered by a data structure (e.g., heap, dynamic trees, hash table, dictionary, inverted index, fractional cascading, etc.) is being verified, namely query and update operations defined over a structured and dynamic data collection. For example, for the case of dynamic trees [53], a query can be “is node v a child of u?” and an update can be “move subtree T to node v”. It is important to note that since any ordinary data structure (of size n) is implemented in the RAM model and since every operation is simulated by reading and writing bits in memory, it is consequently true that every data structure can be authenticated using the memory checking model by individually verifying (with O(log n) complexity) every elementary read or write operation needed during a query or update in the data structure. This reduction immediately gives raise to a very reasonable question: Is it useful to work with the more abstract model of authenticated data structures? The answer is yes, and the main reason is related to efficiency. The reduction of authenticated data structures to memory checking is in practice highly inefficient. Firstly, the verification of any read or write operation introduces a logarithmic (in the size of the data structure) multiplicative factor in the complexity of the verification protocol. Secondly and more importantly, verifying the functionality of a data structure through memory checking corresponds to verifying the entire execution of the query or the update algorithm that is defined for the data structure in study, which is in general unnecessary because what is needed to be verified in the result of such a query or update algorithm and not its entire execution. The best way to demonstrate how inefficient such a reduction to memory checking can be is 34 through a concrete and very illustrative example. Consider the verification of range search queries. To verify a range search query by solely relying on the known memory-checking techniques, we need to verify the entire search procedure in an underlying data structure of range searching, e.g., the range search tree, a procedure which requires O(log n + k) reads of memory locations, where n is the size of the data set and k is the number of the elements belonging to the queried range. Given the logarithmic overhead introduced by the memory checking (by using for example Blum’s memory checker [15]), the total verification cost is O(log2 n+k log n). Instead, by using an implementation of an authenticated dictionary (e.g., a Merkle tree), the complexity to authenticate range search queries is only O(log n+k). Even better as it has been shown by Tamassia and Triandopoulos [107], by using optimal data certification techniques in combination with optimal authenticated dictionaries, it is possible to authenticate range search queries, optimally, in O(k) complexity using proofs of size only O(log k). Therefore, authenticated data structures can significantly reduce the complexity for the authentication of complicated queries, by combining cryptography with algorithmics, which is the basis of constructing efficient authenticated data structures solutions. Additionally, apart from efficiency, in the authenticated data structures model, communication complexity does matter. We are interested in minimizing the size of the proof that the untrusted server computes for the verification of a query. However in memory checking, bandwidth does not come into place since query complexity is the main complexity measure that is studied. Overall, authenticated data structures provide a more powerful, more refined and more expressive model for studying the verification of computations that take place during data management and data querying. Finally, we note that memory checking solutions have been traditionally constructed with cryptographic primitives that bear very weak assumptions (e.g., existence of one-way functions [15]). However, authenticated data structures do employ stronger (e.g., strong RSA assumption [51]) assumptions, still widely acceptable and widely used by the cryptography community. Chapter 3 Accumulators for authenticated hash tables In this chapter we describe our first authenticated data structure scheme for a hash table data structure. We use cryptographic accumulators [23] as our basic cryptographic primitive to verify standard hash table queries. Specifically, our main results (Theorem 3.2 and Theorem 3.4) show how to use two different accumulator schemes [23, 83] in a hierarchical way (see Figure 3.1) over the set and the underlying hash table, in order to achieve the verification of both membership and non-membership queries. In the presented fully-dynamic schemes, communication and verification complexities are constant, the query complexity is constant and the update complexity is sublinear, realizing the first authenticated hash table with this performance. Our schemes (denoted with RHT and BHT in Table 3.1) strictly improve, in terms of complexity, upon previous schemes based on accumulators and other cryptographic primitives (for a detailed comparison, see Table 3.1). Their security is based on two widely accepted assumptions, the strong RSA assumption [10] and the bilinear q-strong Diffie-Hellman assumption [16]. Finally, to meet the needs of different data-access patterns, we extend our schemes to achieve a reverse performance, i.e., sublinear query cost, but constant update cost. 35 [15, 48, 75, 81] [11] setup() n n update() log n 1 refresh() log n 1 query() log n n verify() log n n proof Π(q) log n n info. upd 1 1 assumption Generic CR D. Log [23, 101] n 1 n log n 1 1 1 1 [51] RHT n n  n 1   n n log n / 1 n 1 / n 1 1 1 1  n 1 Strong RSA ∗ ∗ ∗ ∗∗ ∗∗ [83] BHT n n 1 1  n n /1 1 1 / n log n 1 1 1 1 1 1 B. q-DH + ∗ ∗∗ ∗∗ Table 3.1: In this table, we exhibit a detailed comparison of asymptotic access and group complexities of various authenticated data structure schemes in the literature with the complexities of our schemes. The underlying data structure scheme is for a hash table storing n elements. All the authenticated data structure schemes compared are defined by algorithms {genkey, setup, update, refresh, query, verify} (see Definition 2.3). Parameter 0 <  < 1 is a constant, “D. Log” stands for “Discrete Logarithm”, “Generic CR” stands for “Generic Collision Resistance” and “B. q-DH” stands for “Bilinear q-strong Diffie-Hellman”. In all constructions the authenticated data structure has group complexity (i.e., size) O(n) and genkey() has O(1) complexity. Π(q) denotes the proof for a query q and upd is the update information output by algorithm update(). Our schemes are denoted with RHT (RSA-based authenticated hash table) and BHT (bilinear-map-based authenticated hash table). The “one-star” notation ∗ denotes an expected complexity, the “two-star” notation ∗∗ denotes an expected amortized complexity, whereas the “plus” notation + denotes an amortized complexity. All schemes in the table are publicly-verifiable. 36 37 3.1 Preliminaries In this section we describe some algorithmic and cryptographic primitives and other useful concepts that are used in our approach. 3.1.1 Hash tables The main functionality of a hash table data structure T(X ) is to support optimal complexity look-ups of elements that belong to a general dynamic set X (i.e., not necessarily ordered). Elements can be inserted or deleted from X . The elements in X are drawn from a universe U. The data structure scheme. The data structure scheme {query(), update(), check()} as defined in Definition 2.2 for a hash table T(X ) is as follows: 1. {true, false} ← query(x, T(X )): Given an element x ∈ U, return true if x ∈ X or false otherwise; 2. T(X 0 ) ← update(x, T(X )): Given an element x ∈ U such that x ∈ / X , insert element x into X and output T(X 0 ); Given an element x ∈ U such that x ∈ X , delete element x from X and output T(X 0 ); 3. {accept, reject} ← check(x, b ∈ {true, false}, T(X )): If x ∈ X (or x ∈ / X ) and b = false (or b = true), return reject. Else return accept. Note that answering a hash table query can be implemented to have O(1) expected complexity (see Theorem 3.1) and that both insertions and deletions in a hash table can be implemented to have O(1) expected amortized complexity (see Theorem 3.1). Implementation. Different ways of implementing hash tables have been extensively studied (e.g., [34, 58, 62, 69, 80]). Here we use a simple approach for the implementation of the plain hash table: Suppose we wish to store n elements from a universe U in a data structure 38 so that we can have expected constant look-up complexity. For totally ordered universes and by searching based on comparisons, it is well known that an Ω(log n) lower bound is in place. Essential for the construction of a hash table—and for achieving better efficiency than Ω(log n)—is a two-universal hash function: Definition 3.1 (Two-universal hash function [26]) A two-universal hash function H : U → {1, . . . , m}, randomly selected from a family of two-universal hash functions H, is a function such that for any two elements e1 , e2 ∈ U, it is Pr [H(e1 ) = H(e2 )] ≤ 1 . m (3.1) By using a two-universal hash function, hash tables can be constructed as follows. • Set up an one-dimensional table T[1 . . . m] where m = O(n); • Pick a two-universal hash function H : U → {1, . . . , m} as defined in Definition 3.1; • Store element e in slot T[H(e)] of the table. The probabilistic property that holds for hash function h implies that for any slot of the table, the expected number of elements mapped to it is O(1). Also, if h can be computed in O(1) time, looking-up an element has expected constant complexity. But the above property of hash tables comes at some cost. The expected constantcomplexity look-up holds when the number of elements stored in the hash table does not change, i.e., when the hash table is static. In particular, because of insertions, the number of elements stored in a slot may grow and we cannot assume anymore that is expected to be constant. A different problem arises in the presence of deletions as the number n of elements may become much smaller than the size m of the hash table. Thus, we may no longer assume that the hash table uses O(n) space. In order to deal with updates, we periodically update the size of the hash table by a constant factor (e.g., doubling or halving its size). This is an expensive operation since we 39 have to rehash all the elements. Therefore, there might be one update (over a course of O(n) updates) that has O(n) rather than O(1) complexity. Thus, hash tables for dynamic sets typically have expected O(1) query complexity and O(1) expected amortized update complexity. Methods that vary the size of the hash table for the sake of maintaining O(1) expected query complexity, fall into the general category of dynamic hashing. The above discussion is summarized in the following theorem: Theorem 3.1 (Dynamic hashing [29]) For a set of size n, dynamic hashing can be implemented to use O(n) space and have O(1) expected query complexity for (non-)membership queries and O(1) expected amortized complexity for elements insertions or deletions. 3.1.2 The RSA accumulator We now give an overview of the RSA accumulator, which will be used for the construction of our first solution, i.e., the construction of the authenticated data structure scheme RHT . Prime representatives. For security and correctness reasons that will soon become clear, in our construction we extensively use the notion of prime representatives of elements. Initially introduced by Baric and Pfitzmann [10], prime representatives provide a solution whenever it is necessary to map general elements to prime numbers. In particular, one can map a k-bit element ei to a 3k-bit prime xi using a two-universal hash function (see Definition 3.1). In our context, we are using a two-universal hash function h : A → B, which is different than the one (i.e., H(.)) we use to map elements to buckets, and where set A is the set of 3k-bit boolean vectors and B is the set of k-bit boolean vectors. Specifically, we use the two-universal hash function h(x) = Fx , where F is a k × 3k boolean matrix. Since the linear system h(x) = Fx has more than one solution, one k-bit element is mapped to more than one 3k-bit elements. We are interested 40 in finding only one such solution which is prime; this can be computed efficiently according to the following result: Lemma 3.1 (Prime representatives [42, 51]) Let H be a two-universal family of functions mapping {0, 1}3k to {0, 1}k and let h ∈ H. For any element ei ∈ {0, 1}k , we can compute with high probability a prime xi ∈ {0, 1}3k such that h(xi ) = ei by sampling O(k 2 ) times from the set of inverses h−1 (ei ). By Lemma 3.1, we have that computing prime representatives has expected constant complexity, i.e., independent of n. Also, solving the k × 3k linear system in order to compute the set of inverses has polynomial complexity in k by using standard methods (e.g., Gaussian elimination). Finally, we note that, in our context, prime representatives are computed and stored only once. Indeed, using the above method multiple times for computing the prime representative of the same element will not yield the same prime as output, for Lemma 3.1 describes a randomized process. From now on, given a k-bit element x, we denote with r(x) the 3k-bit prime representative that is computed as described by Lemma 3.1. Description of the RSA accumulator. We now give an overview of the RSA accumulator [10, 13, 23, 68], which provides an efficient technique to produce a short (computational) proof that a certain element is (or is not) a member of a set. The RSA accumulator works as follows. Suppose we have the set of k-bit elements X = {x1 , x2 , . . . , xn }. Let N be a k 0 -bit RSA modulus (k 0 > 3k), namely N = pq, where p, q are strong primes [23]. We can represent X compactly and securely with an accumulation value acc(X ), which is a k 0 -bit integer, as follows acc(X ) = g r(x1 )r(x2 )...r(xn ) mod N , where g ∈ QRN and r(xi ) is a 3k-bit prime representative, computed using a two-universal hash function h(.). Note that the RSA modulus N , the exponentiation base g and the twouniversal hash function comprise the public key pk, i.e., information that is available to the 41 adversary (the factorization of N is kept secret). Subject to the accumulation acc(X ), every element x in set X has a membership witness (Wx , r(x), x), where Wx = g Q xj ∈X :xj 6=x r(xj ) mod N . (3.2) Membership of x in X is verified by means of the following tests: 1. Checking that r(x) is a prime number; 2. Checking that h(r(x)) = x; r(x) 3. Computing Wx mod N and checking that this equals acc(X ). Moreover, subject to the accumulation acc(X ), every element x ∈ / X has a non-membership witness as well [68], namely the integer values (Ax , Bx , r(x), x) such that ! n Y r(xi ) Ax + r(x)Bx = 1 . (3.3) i=1 Note that Ax and Bx can be computed by running the extended Euclidean algorithm [102] on r(x1 )r(x2 ) . . . r(xn ) and r(x). Given the accumulation value acc(X ) and the non-membership witness (Ax , Bx , r(x), x), non-membership of x in X can be verified by means of the following tests: 1. Checking that r(x) is a prime number; 2. Checking that h(r(x)) = x; 3. Computing acc(X )Ax g xBx mod N and checking that this equals g. We finally note that the representation acc(X ) has the crucial property that any computationally bounded adversary Adv who does not know φ(N ) cannot find another set of elements X 0 6= X such that acc(X 0 ) = acc(X ), unless Adv breaks the the factoring assumption [10]. However, in order to achieve some more advanced security goals we need, we are going to use a stronger assumption: 42 Assumption 3.1 (Strong RSA assumption) Let k be the security parameter. Given a k-bit RSA modulus N and a random element x ∈ Z∗N , there is no polynomial-time algorithm that outputs y > 1 and a such that ay = x mod N , except with negligible probability neg(k). The security of our RSA-based solution is based on the following result. To assist the reader, we also recall the proof of the security results for membership proofs [10] and for non-membership proofs [68]. Lemma 3.2 (Security of the RSA accumulator [10, 68]) Let k be the security parameter, h be a two-universal hash function mapping 3k-bit integers to k-bit integers, N be a (3k + 1)-bit RSA modulus and g ∈ QRN . Given N , g, a set of k-bit elements X and h, suppose there is a polynomial-time algorithm for one of the tasks below (or both): • It outputs x ∈ / X , W and prime r such that h(r) = x and Wr = acc(X ) mod N ; • It outputs x ∈ X , A, B and prime r such that h(r) = x and acc(X )A g rB = g mod N . Then there is a polynomial-time algorithm for breaking the strong RSA assumption. Proof: Let X = {x1 , x2 , . . . , xn } and let x ∈ / X . For the membership proof, suppose there is an algorithm that finds W, r and x such that r is a prime number, h(r) = x and Wr = g r(x1 )r(x2 )...r(xn ) mod N . Since x ∈ / X , by construction of the prime representatives, it is r ∈ / {r(x1 ), r(x2 ), . . . , r(xn )} (recall that h(r) = x). Let now e = r and R = r(x1 )r(x2 ) . . . r(xn ). The algorithm can now compute the e-th root of g as follows: It computes a, b ∈ Z such that aR + br = 1 by using the extended Euclidean algorithm, since r is a prime and r ∈ / {r(x1 ), r(x2 ), . . . , r(xn )}. Let now y = Wa g b mod N . It is y e = War g br = g aR+br = g mod N . Therefore the algorithm can be used for breaking the strong RSA assumption. For the non-membership proof case, since x ∈ X the algorithm can output the e-th root of g as 43 y = WxA g B , where Wx is the membership witness defined in Relation 3.2. Then y e = WxeA g eB = acc(X )A g rB = g mod N . This completes the proof. 2 3.1.3 The bilinear-map accumulator We next give an overview of the bilinear-map accumulator [83] which will be used for the construction of our second solution, i.e., the construction of the authenticated data structure scheme BHT . Bilinear pairings. Before presenting the bilinear-map accumulator we describe some basic terminology and definitions about bilinear pairings. Let G1 , G2 be two cyclic multiplicative groups of prime order p, generated by g1 and g2 and for which there exists an isomorphism ψ : G2 → G1 such that ψ(g2 ) = g1 . Let also G be a cyclic multiplicative group with the same order p and e : G1 × G2 → G be a bilinear pairing with the following properties: 1. Bilinearity: e(P a , Qb ) = e(P, Q)ab for all P ∈ G1 , Q ∈ G2 and a, b ∈ Zp ; 2. Non-degeneracy: e(g1 , g2 ) 6= 1; 3. Computability: There is an efficient algorithm to compute e(P, Q) for all P ∈ G1 and Q ∈ G2 . In our setting we have G1 = G2 = G and g1 = g2 = g. A bilinear pairing instance generator is a probabilistic polynomial-time algorithm that takes as input the security parameter 1k and outputs a uniformly random tuple t = (p, G, G, e, g) of bilinear pairings parameters. Here we have to make an important observation: Groups G and G are generic. That is, their elements are not simple integers and doing operations between elements can be complicated. E.g., group elements of G and G (for which there exist efficient constructions of a bilinear map e(., .)) are usually points on an elliptic curve. Also the operations in the 44 exponent of elements of G and G are performed modulo p, since this is the order of both groups G and G. A simplified exposition of these groups and their arithmetic is given in the book of Katz and Lindell [61]. Description of the bilinear-map accumulator. Similarly with the RSA accumulator, the bilinear-map accumulator [32, 83] comprises an efficient way to provide short proofs of (non-)membership for elements that (do not) belong to a set. The bilinear-map accumulator works as follows. Let s ∈ Z∗p is a randomly chosen value that constitutes the trapdoor in the scheme (in the same way that φ(N ) was the trapdoor in the RSA accumulator). The accumulator accumulates elements in Z∗p − {−s} (where p is a k-bit prime) and the accumulated value is an element in G. Given a set of n elements X = {x1 , x2 , . . . , xn } the accumulation value acc(X ) is defined as acc(X ) = g (x1 +s)(x2 +s)...(xn +s) , where g is a generator of group G of prime order p. We note here that acc(X ) can be 2 q constructed by only using X and g, g s , g s , . . . , g s , where q ≥ |X |, by using polynomial interpolation (see Lemma 3.15). The proof of membership for an element x that belongs to set X will be the witness (Wx , x) where Wx = g Q xj ∈X :xj 6=x (xj +s) . (3.4) Accordingly, a verifier can test set membership for x by computing e(Wx , g s g x ) and checking that this equals e(acc(X ), g). Moreover, subject to the accumulation acc(X ), every element x ∈ / X has a non-membership witness [32], namely the elements in (Ax = g α(s) , Bx = g β(s) , x) such that " n # Y (xi + s) α(s) + (x + s)β(s) = 1 . (3.5) i=1 Note that α(s) and β(s) are polynomials that can be computed by running the extended Euclidean algorithm on polynomials (x1 + s)(x2 + s) . . . (xn + s) and (x + s). Given the 45 accumulation value acc(X ) and the witnesses Ax and Bx , non-membership of x in X can be verified by computing e(acc(X ), Ax )e(g s g x , Bx ) and checking that this equals e(g, g). Proving the security of the bilinear-map accumulator requires the bilinear q-strong DiffieHellman assumption, a slightly stronger assumption than the q-strong Diffie-Hellman assumption [16]1 , that can be stated as follows: Assumption 3.2 (Bilinear q-strong Diffie-Hellman assumption) Let k be the security parameter and let (p, G, G, e, g) be a uniformly randomly generated tuple of bilinear q pairings parameters. Given the elements g, g s , . . . , g s ∈ G for some s chosen at random from Z∗p , where q = poly(k), there is no polynomial-time algorithm that can output the pair (a, e(g, g)1/(s+a) ) ∈ Zp × G except with negligible probability neg(k). Lemma 3.3 (Security of the bilinear-map accumulator [32, 83]) Let k be the security parameter and let t = (p, G, G, e, g) be a uniformly randomly generated tuple of bilinear q pairings parameters. Given the elements g, g s , . . . , g s ∈ G for some s chosen at random from Z∗p and a set of k-bit elements X (q ≥ |X |), suppose there is a polynomial-time algorithm for one of the tasks below (or both): • It outputs x ∈ / X and W such that e(W, g s g x ) = e(acc(X ), g); • It outputs x ∈ X , A and B such that e(acc(X ), A)e(g s g x , B) = e(g, g). Then there is a polynomial-time algorithm for breaking the bilinear q-strong Diffie-Hellman assumption. Proof: Let X = {x1 , x2 , . . . , xn } and let x ∈ / X . Suppose there is an algorithm that finds W such that e(W, g s g x ) = e(acc(X ), g). This implies e(W, g)s+x = e(g, g)(s+x1 )(s+x2 )...(s+xn ) . 1 However, proving just collision resistance of the accumulator requires the plain q-strong Diffie-Hellman assumption [83]. 46 Note now that the quantity Πn = (s + x1 )(s + x2 ) . . . (s + xn ) can be viewed as a polynomial in s of degree n. Since x ∈ / X , we have that (s + x) does not divide Πn and therefore values c and P can be computed such that Πn = c + P (s + x). Therefore the algorithm can output (x, e(g, g)1/(s+x) ) as   c−1  x, e(W, g)e(g, g)−P . For a non-membership proof, note that we can output e(g, g)1/(s+x) as e(Wx , A)e(g, B) , since x ∈ X and e(acc(X ), A)e(g s g x , B) = e(g, g), where Wx is a membership witness given in Relation 3.4. Therefore the bilinear q-strong Diffie-Hellmann assumption can be broken in both cases. 2 Size of non-membership witnesses. We note here that although non-membership witnesses are constructed in the same fashion in both instantiations of the accumulator schemes, their sizes differ considerably. In the RSA accumulator case, the integers Ax and Bx (see Relation 3.3) can have size proportional to the number of the elements in X . In the bilinear-map accumulator case, Ax and Bx (see Relation 3.5) are always two group elements in G (this takes advantage of the bilinear map e(., .), which is not known to exist for RSA groups), therefore their size never depends on the elements collection X . This observation is very important and will contribute to significant complexity improvements in Chapter 5. We now continue with some necessary algorithmic and definitional framework. We present an algorithmic construction called accumulation tree that will be used in both our constructions. 47 3.1.4 The accumulation tree Let X = {x1 , x2 , . . . , xn } be a set of elements. Given a constant  < 1 such that 0 <  < 1, the accumulation tree of X , denoted with T (), is a rooted tree with n leaves defined as follows: 1. The leaves of T () store the elements x1 , x2 , . . . , xn ; 2. T () consists of exactly l = 1  levels; 3. All the leaves are at the same level; 4. Every node of T () has O(n ) children; 5. Level i in the tree contains O(n1−i ) nodes, where the leaves are at level 0 and the root is at level l. r e f a b g c p d 7 2 9 3 Figure 3.1: The accumulation tree of a set of 64 elements for  = 13 : every internal node 1 has 4 = 64  children, there are 3 = 1 levels in total, and there are 641−i/3 nodes at level i = 0, 1, 2, 3. We note that the levels of the accumulation tree are numbered from the leaves to the root of the tree, i.e., the leaves have level 0, their parents level 1 and finally the root has level l. The structure of the accumulation tree, which for a set of 64 elements is shown in Figure 3.1, resembles that of normal “flat” search trees, in particular, the structure of a B-tree [29]. However there are some differences: First, every internal node of the accumulation tree, instead of having a constant upper bound on its degree, it has a bound that is a function of the number of its leaves, n; also, its depth is always maintained to be constant, namely 48 O 1   . Note that it is simple to construct the accumulation tree when n is an integer (see Figure 3.1). Else, we define the accumulation tree to be the unique tree of degree dn e (by assuming a certain ordering of the leaves). This maintains the degree of internal nodes to be O(n ). Using the accumulation tree and search keys stored at the internal nodes, one can search for an element in O(n ) time and perform updates in O(n ) amortized time. Indeed, as the depth of the tree is not allowed to vary, one should periodically (e.g., when the number of elements of the tree doubles) rebuild the tree spending O(n) time. Actually, by using individual binary trees to index the search keys within each internal node, queries could be answered in O(log n) time and updates could be processed in O(log n) amortized time. Yet, the reason we build this flat tree is not to use it as a search structure, but rather to design an authentication structure for defining the digest of X that matches the optimal querying performance of hash tables. The idea is as follows: we wish to hierarchically employ an accumulator over the subsets (of accumulation values) defined by each internal node in the accumulation tree, so that (non)-membership proofs of size proportional to the depth of the tree (hence of constant size) are defined with respect the root digest (accumulation value of the entire set). 3.2 Scheme based on the RSA accumulator In this section we present a secure authenticated data structure scheme for an authenticated hash table RHT = {genkey, setup, update, refresh, query, verify} and prove it satisfies the complexities of Table 3.1. For each algorithm, we are going to describe two constructions, i.e., the “plain” construction and the one with “precomputed witnesses”. Algorithm {sk, pk} ← genkey(1k ): The algorithm picks a constant 0 <  < 1 and d1/e+1 RSA moduli Ni = pi qi (i = 0, . . . , l), where pi , qi are strong primes [23] and l = d1/e. The 49 length of the RSA moduli is defined by the recursive relations |Ni+1 | = 3|Ni | + 1 , where |N0 | = 3k+1 and i = 0, . . . , l−1. The algorithm also picks l+1 public bases gi ∈ QRNi to be used for exponentiation. Finally, given l + 1 families of two-universal hash functions H0 , H2 , . . . , Hl , the algorithm randomly picks one function hi ∈ Hi , for i = 0, . . . , l (hi will be used for computing prime representatives). The function hi is such that it maps (|Ni |−1)-bit primes to ((|Ni | − 1)/3)-bit integers2 . The algorithm sets sk = {φ(Ni ) = (pi − 1)(qi − 1) : i = 0, . . . , l} and pk = {Ni , gi , hi : i = 0, . . . , l; }. Note that since l is constant all RSA moduli have size that only depends on the security parameter k. Also, since 1/ is constant, the algorithm has access complexity O(1). 3.2.1 Main authenticated data structure Let X = {x1 , x2 , . . . , xn } be a collection of n elements. X is stored in a dynamic hash table D0 by using a two-universal hash function H(.) that maps each element to a certain bucket (see Theorem 3.1). Specifically, the hash table D0 has m = O(n) buckets L1 , L2 , . . . , Lm , where each bucket contains O(1) elements in expectation (by the property of the two-universal hash function—see Relation 3.1). Let now 0 <  < 1 be the fixed constant chosen by algorithm setup(). We build the accumulation tree T () on top of the buckets, i.e., every leaf of the tree corresponds to a specific bucket and not to an element within the bucket. Since the number of buckets is m = O(n), the internal nodes of the accumulation tree have O(n ) children. Our authenticated data structure is defined with respect to the accumulation tree as follows. We hierarchically employ the RSA accumulator over the buckets of the hash table so that to augment the accumulation tree with a collection of corresponding accumulation 2 The choice of the domains and ranges of functions hi and of the lengths of moduli Ni is due to the requirement that prime representatives should be smaller numbers than the respective moduli (see [101]). As we will see in Section 3.2.5, using ideas from [10] it is possible to avoid the increasing size of the RSA moduli and instead use only one size for all Ni ’s. By doing so, however, we are forced to prove security in the random oracle model (using cryptographic hash functions), which is fine for practical applications. 50 values. That is, assuming the setup parameters are in place, for any node v in the accumulation tree we define its accumulation value χ(v) recursively along the tree structure, as a function of the accumulation value of its children (in a similar way as in a Merkle tree). We describe algorithm setup() in detail below: Algorithm {auth(D0 ), d0 } ← setup(D0 , sk, pk): The algorithm builds the accumulation tree T () on top of the m buckets L1 , L2 , . . . , Lm . For every leaf node v in tree T () that lies at level 0 and corresponds to a bucket Lj , the algorithm sets Q χ(v) = g0 x∈Lj r0 (x) mod N0 ∈ Z∗N0 . (3.6) For every non-leaf node v in T () that lies at level 1 ≤ i ≤ l, the algorithm sets: Q χ(v) = gi u∈N (v) ri (χ(u)) mod Ni ∈ Z∗Ni , (3.7) where ri (a) is a prime representative of a computed using function hi , N (v) is the set of children of node v (when node v refers to a bucket, i.e., it is a leaf, we define as v’s children to be the elements contained in the bucket) and gi ∈ QRNi . The authenticated data structure auth(D0 ) output by the algorithm consists of the following components: 1. The accumulation tree T (); 2. The prime representatives ri (χ(v)) that correspond to the values χ(v), such that hi (ri (χ(v))) = χ(v)—as used in Relations 3.6 and 3.7, for all nodes v ∈ T () (at some level i). Let r be the root of the tree T . The algorithm also outputs d0 = χ(r), i.e., the digest of the authenticated data structure is the χ(.) value of the root of the accumulation tree. Precomputed witnesses. In order to achieve constant-complexity queries, algorithm setup() can also compute precomputed witnesses. Namely, for every node v of the accumulation tree that lies at level 0 ≤ i ≤ l, let N (v) be the set of its children (for a leaf node, 51 we consider as “children” the elements in the respective bucket). For every j ∈ N (v) the algorithm computes Wj(v) = χ(v) ri (χ(j))−1 Q = gi u∈N (v)−{j} ri (χ(u)) mod Ni , (3.8) and stores Wj(v) at v. When the construction with precomputed witnesses is used, auth(D0 ) also includes Wj(v) , for all v ∈ T () and all j ∈ N (v), along with T () and ri (χ(v)). Lemma 3.4 Algorithm setup() of the authenticated data structure scheme RHT has O(n) access complexity both with and without precomputed witnesses. Moreover, the authenticated data structure auth(D0 ) output by setup() has O(n) group complexity. Proof: For a node v that has degree d, computing χ(v) from Relation 3.7 has O(d) access complexity. At level i ≥ 1, there are O(m1−i ) such nodes, of degree O(m ), where m is the number of the buckets (at level 0 there are m nodes of constant degree). Since m = O(n) and T () has O(1) levels, the access complexity without precomputed witnesses is O(n). For a node v at level i that has degree d, computing Wj(v) for all j ∈ N (v) from Relation 3.8 has O(d) access complexity: Compute χ(v) first, and then set −1 Wj(v) = χ(v)ri (χ(j)) mod Ni . Note that the computation of the inverse in the exponent is feasible because setup() has access to the secret key, that contains the factorization φ(N0 ) (we will see that this computational task requires more work when the factorization is not available). Therefore the access complexity of setup() with precomputed witnesses is also O(n), since computing one such witness requires O(1) work and there are O(n) such witnesses. Finally, every node of T () stores one group element (and two group elements in the precomputed witnesses case). Since the tree T () has O(n) nodes, the group complexity of auth(D0 ) is O(n). 2 3.2.2 Updates We now describe how updates can be efficiently supported in the authenticated hash table scheme, by using a rebuilding technique, appropriately adjusted from the book of Cormen 52 et al. [29]: Since a hash table with m = O(n) buckets is used, we should expect that at some point the update algorithms will need to rebuild the table (i.e., rehash all the elements and reinsert them in a bigger or smaller hash table) and the related authenticated data structures. This is done according to the following definition: Definition 3.2 Let m be the current number of buckets of the authenticated hash table and n be the number of elements contained in the authenticated hash table after an update has been performed. Define α = n m to be the load factor of the authenticated hash table after the update. If α = 1 (full table) the capacity of the hash table is doubled. If α = 1 4 (near empty table) the capacity of the hash table is halved. The rebuilding method described in Definition 3.2, adjusted to our authenticated hash table construction, is essential to get the necessary amortized results of Lemmata 3.5 and 3.7, which constitutes the main complexity results of this work (for similar methods see the book of Cormen et al. [29]). We describe now algorithms update() and refresh() in detail. Algorithm {Dh+1 , auth(Dh+1 ), dh+1 , upd} ← update(u, Dh , auth(Dh ), dh , sk, pk): Let m be the current number of buckets of Dh and n be the number of elements stored in Dh , after the update has been performed. We distinguish two cases: Case 1. m 4 < n < m: In this case there is no need to rebuild the table and the update is performed as follows: Suppose the update is “insert element e”. The algorithm computes the bucket j = H(e) (see Relation 3.1) and inserts e in bucket j. Let v0 be the node of T () referring to bucket j and r0 (e) be a new prime representative for element e computed using function h0 , i.e., h0 (r0 (e)) = e. Let v0 , v1 , . . . , vl be the path in T () from node v0 to the root of the tree. The algorithm initially sets χ0 (v0 ) = χ(v0 )r0 (e) mod N0 , i.e., it updates the accumulation value that corresponds to the updated bucket. Note that 53 if the update is “delete element e”, the algorithm sets −1 χ0 (v0 ) = χ(v0 )r0 (e) mod N0 . (3.9) Subsequently, for j = 1, . . . , l the algorithm sets 0 −1 χ0 (vj ) = χ(vj )rj (χ (vj−1 ))rj (χ(vj−1 )) mod Nj , (3.10) where rj (χ(vj−1 )) is the prime representative of accumulation value χ(vj−1 ) and rj (χ0 (vj−1 )) is a new prime representative for the updated accumulation value χ0 (vj−1 ), such that hj (rj (χ0 (vj−1 ))) = χ0 (vj−1 ) . All these values are stored by the algorithm after they have been computed. The algorithm also outputs the new prime representatives rj (χ0 (vj−1 )) (j = 1, . . . , l) as the information upd along the path from the updated bucket to the root of the tree. Information upd also includes r0 (e) and χ0 (vl ). Also it sets dh+1 = χ0 (vl ), i.e., the updated digest is the updated χ(.) value of the root of T (). Finally the new authenticated data structure auth(Dh+1 ) is computed as follows. Let auth(Dh ) be the previous authenticated data structure that is input to the algorithm. Overwrite the values rj (χ(vj−1 )) (j = 1, . . . , l) with the new values rj (χ0 (vj−1 )) (j = 1, . . . , l) and output the updated structure. The behavior of the algorithm in the precomputed witnesses case is the same with the difference that upd = Ø. Case 2. n = If n = m , 4 m 4 or n = m: In this case the hash table is rebuilt according to Definition 3.2: then the algorithm builds a data structure Dh+1 with m/2 buckets. Otherwise, i.e., when n = m, the algorithm builds a data structure Dh+1 with 2m buckets. Subsequently, it outputs auth(Dh+1 ) and dh+1 by calling algorithm setup(Dh+1 , sk, pk) and sets upd = Ø. Lemma 3.5 By using the rebuilding policy of Definition 3.2, algorithm update() of the authenticated data structure scheme RHT has O(1) expected amortized access complexity. Moreover, the update information upd output by update() has O(1) group complexity. Proof: The O(1) expected complexity bound comes from the fact that the number of operations (as long as the group elements contained in upd) that update() performs is always 54 a function of l = 1/ = O(1)—also the actual hash table update is performed, which has expected O(1) complexity. Note that the complexity of the operations in Relations 3.9 and 3.10 is constant since sk contains the factorizations φ(Ni ) and therefore inverses can be computed with the extended Euclidean algorithm. When the hash table has to be rebuilt, algorithm setup() is called, which has O(n) access complexity (see Lemma 3.4). Therefore the result is expected amortized since we can use the rebuilding strategy in Definition 3.2 and follow the same amortized analysis from [29] (i.e., the cost of rebuilding in [29] does not increase due to the O(n) complexity of setup()). 2 Algorithm {Dh+1 , auth(Dh+1 ), dh+1 } ← refresh(u, Dh , auth(Dh ), dh , upd, pk): Let m be the current number of buckets of Dh and n be the number of elements stored in Dh , after the update has been performed. We distinguish two cases: Case 1. m 4 < n < m: Suppose the update is “insert element e”. The algorithm computes the bucket j = H(e) (see Relation 3.1) and inserts e in bucket j. Let v0 be the node of T () referring to bucket j. Let v0 , v1 , . . . , vl be the path in T () from node v0 to the root of the tree. The algorithm, for j = 0, . . . , l, sets rj (χ(vj )) = rj (χ0 (vj )) , i.e., it updates the prime representatives that correspond to the updated path by using the information upd3 . Finally it outputs the updated hash table as Dh+1 , the updated prime representatives rj (χ(vj )) (along with the ones that belong to the nodes that are not updated) as auth(Dh+1 ) and χ0 (vl ) (contained in upd) as dh+1 . Precomputed witnesses. When precomputed witnesses are used, the algorithm should update Wj(v) for v = v0 , v1 , . . . , vl and for all j ∈ N (v) (see Relation 3.8). To achieve that 3 Note that information upd is not required for refresh() to perform this task. Algorithm refresh() uses upd for efficiency. Namely, algorithm refresh() could compute the updated values rj (χ(vj )) by performing explicit exponentiations, which would have O(n ) complexity. 55 efficiently, the following result from Sander et al. [101] for efficiently maintaining updated precomputed witnesses is used: Lemma 3.6 (Computing witnesses [101]) Suppose we are given the elements collection X = {x1 , x2 , . . . , xn }, an RSA modulus N and g ∈ QRN . Without the knowledge of φ(N ), the witnesses Wi = g Q j6=i xj mod N for i = 1, . . . , n can be computed with O(n log n) complexity. In order to compute the updated witnesses, the algorithm uses the algorithm from Sander et al. [101] that provides the above result for all nodes vi , 0 ≤ i ≤ l, and for all j ∈ N (vi ), as follows. For each vi , it uses the result from Lemma 3.6 with inputs the updated elements {ri (χ(j)) : j ∈ N (vi )}, the RSA modulus Ni and the exponentiation base gi . In this computation the updated prime representative ri (χ(vi−1 )), computed with O(n ) exponentiations, is used (note that O(n ) exponentiations are required since sk is not available). This computation outputs the witnesses Wj(vi ) for j ∈ N (vi ) (note that the witness Wvi−1 (vi ) , for i > 0, remains the same). Also, since the algorithm for updating witnesses [101] is run on O(1/) nodes v with |N (v)| = O(n ), we have, by Lemma 3.6, that the witnesses update complexity is O(n log n) (for the complete result see Lemma 3.7). Case 2. m = If n = m , 4 m 4 or n = m: In this case the hash table is rebuilt according to Definition 3.2: then the algorithm builds a data structure Dh+1 with m/2 buckets. Otherwise, i.e., when n = m, the algorithm builds a data structure Dh+1 with 2m buckets. Subsequently, it outputs auth(Dh+1 ) and dh+1 by using Relations 3.6 and 3.7. In the case of precomputed witnesses, it computes the new witnesses to be included in auth(Dh+1 ) by using Lemma 3.6 (note that refresh() cannot call setup() directly since it does not have access to the secret key sk and that is why it has to use Relations 3.6, 3.7 and Lemma 3.6). Lemma 3.7 By using the rebuilding policy of Definition 3.2, algorithm refresh() of the authenticated data structure scheme RHT has O(1) expected amortized access complexity, without precomputed witnesses. With precomputed witnesses, algorithm refresh() has O(n log n) expected amortized access complexity. 56 Proof: For the case when no precomputed witnesses are used, the argument is the same as in Lemma 3.5. For the case of precomputed witnesses, suppose there are currently n elements in the hash table and that the capacity of the table (i.e., number of buckets) is m. Note that, by the rebuilding policy of Definition 3.2, it is m/4 < n < m. As we know, each one of the m buckets stores O(1) elements in expectation. When an update takes place and no rebuilding of the table is triggered, all the witnesses along the path of the update of the accumulation tree have to be updated. By using the algorithm described in Lemma 3.6, the witnesses within the bucket can be updated in expected complexity O(1), since the size of the bucket is an expected value. The witnesses of the internal nodes can be updated in O(m log m) complexity and therefore the overall complexity is O(m log m) in expectation. When a rebuilding of the table is triggered then the total complexity is O(m log m), since there is a constant number of levels in the accumulation tree, processing each node has complexity O(m log m) (since the degree of any internal node is O(m )) and the maximum number of nodes that lie in any level is O(m1− ). Therefore, the actual complexity of an update is expected O(m log m), when no rebuilding is triggered and O(m log m) otherwise. We are interested in the expected value of the amortized complexity (expected amortized complexity) of an update. Let ni be the number of elements contained in the hash table after update i and mi be the number of buckets after update i. We do the analysis by defining the following potential function: Fi =    c(2ni − mi ) log mi , αi ≥   c where αi = ni . mi mi 2  − ni log mi , αi < 1 2 , 1 2 The amortized complexity for an update i will be equal to γ̂i = γi +Fi −Fi−1 . Therefore E[γ̂i ] = E[γi ] + Fi − Fi−1 , since Fi is a deterministic function. To perform the analysis more precisely we define some constants. Let c1 be that constant such that if the update complexity C is O(mi log mi ), it is C ≤ c1 mi log mi . (3.11) 57 Also, let r1 be that constant such that if the rebuilding complexity R is O(ni log ni ), it is R ≤ r1 ni log ni . (3.12) mi ≤ ni ≤ mi . 4 (3.13) Also we note that in all cases it holds We perform the analysis by distinguishing the following cases: 1. αi−1 ≥ 1 2 (insertion). For this case, we examine the cases where the hash table is rebuilt or not. In case the hash table is not rebuilt, we have mi−1 = mi and ni = ni−1 + 1. Therefore the amortized complexity will be: E[γ̂i ] = E[γi ] + Fi − Fi−1 ≤ c1 mi log mi + c(2ni − mi − 2ni−1 + mi−1 ) log mi = c1 mi log mi + 2c log mi . In case the hash table is rebuilt (which takes O(n log n) complexity in total) we have mi = 2mi−1 , ni = ni−1 + 1 and ni−1 = mi−1 (which give ni = mi /2 + 1 ≤ mi /2) and the amortized complexity will be: E[γ̂i ] = E[γi ] + Fi − Fi−1 ≤ r1 ni log ni + c(2ni − mi ) log mi − c(2ni−1 − mi−1 ) log mi−1 = r1 ni log ni + c(2ni − mi ) log mi − c ≤ r1 mi log mi /2 2 mi mi log mi /2 + 2c log mi − c log mi /2 2 2 ≤ 2c log mi for a constant c of the potential function such that c > r1 . 2. αi−1 < 1 2 (insertion). Note that that there is no way that the hash table is rebuilt in this case. Therefore mi−1 = mi and ni = ni−1 + 1. If now αi < 1 2 the amortized 58 complexity will be: E[γ̂i ] = E[γi ] + Fi − Fi−1 ≤ c1 mi log mi + c(mi /2 − ni ) log mi − c(mi−1 /2 − ni−1 ) log mi−1 = c1 mi log mi + c(mi /2 − ni − mi /2 + ni−1 ) log mi = c1 mi log mi − c log mi . In case now αi ≥ 1 2 the amortized complexity will be: E[γ̂i ] = E[γi ] + Fi − Fi−1 ≤ c1 mi log mi + c(2ni − mi ) log mi − c(mi−1 /2 − ni−1 ) log mi−1 = c1 mi log mi + c(2(ni−1 + 1) − mi−1 − mi−1 /2 + ni−1 ) log mi = c1 mi log mi + c(3ni−1 − 3mi−1 /2 + 2) log mi = c1 mi log mi + c(3αmi−1 − 3mi−1 /2 + 2) log mi < c1 mi log mi + c(3mi−1 /2 − 3mi−1 /2 + 2) log mi = c1 mi log mi + 2c log mi . 3. αi−1 < 1 2 (deletion). Here we have ni = ni−1 − 1. In case the hash table does not have to be rebuilt (i.e., 1 4 < αi < 1 2 and mi = mi−1 ), we have that the amortized complexity of the deletion is going to be: E[γ̂i ] = E[γi ] + Fi − Fi−1 ≤ c1 mi log mi + c(mi /2 − ni ) log mi − c(mi−1 /2 − ni−1 ) log mi−1 = c1 mi log mi + c(mi /2 − ni − mi /2 + ni−1 ) log mi = c1 mi log mi + c log mi . In case now the hash table has to be rebuilt (which has O(ni log ni ) complexity), we 59 have that mi = mi−1 /2, mi = 4ni and therefore the amortized complexity is: E[γ̂i ] = E[γi ] + Fi − Fi−1 ≤ r1 ni log ni + c(mi /2 − ni ) log mi − c(mi−1 /2 − ni−1 ) log mi−1 ≤ r1 ni log ni + c(mi /2 − ni ) log mi − c(mi − (ni + 1)) log 2mi ≤ r1 ni log ni − c(mi /2 − 1) log mi − c(3ni − 1) ≤ r1 ni log ni − cmi /2 log mi + c log mi ≤ r1 mi log mi − (c/2)mi log mi + c log mi ≤ c log mi , where c must also be chosen to satisfy c > 2r1 . 4. αi−1 ≥ 1 2 (deletion). In this case we have mi−1 = mi . If αi ≥ 1 , 2 the amortized complexity will be: E[γ̂i ] = E[γi ] + Fi − Fi−1 ≤ c1 mi log mi + c(2ni − mi − 2ni−1 + mi−1 ) log mi ≤ c1 mi log mi − 2c log mi . Finally for the case that αi < 1 2 we have E[γ̂i ] = E[γi ] + Fi − Fi−1 ≤ c1 mi log mi + c(mi−1 /2 − ni − 2ni−1 + mi−1 ) log mi = c1 mi log mi + c(3mi−1 /2 − (ni−1 − 1) − 2ni−1 ) log mi = c1 mi log mi + c(3mi−1 /2 − 3ni−1 + 1) log mi = c1 mi log mi + c(3(1/αi−1 )ni−1 /2 − 3ni−1 + 1) log mi ≤ c1 mi log mi + c log mi . 60 Therefore we conclude that for all constants c > 2r1 of the potential function, the expected value of the amortized complexity of any operation is bounded by E[γ̂i ] ≤ c1 mi log mi + 2c log mi . By using now Relation 3.13, there is a constant r such that E[γ̂i ] ≤ rni log ni which implies that the expected value of the amortized complexity of any update (insertion/deletion) in an authenticated hash table containing n elements is O(n log n) for 0 <  < 1. 2 3.2.3 Queries and verification We show now how a proof for an element e ∈ X (or an element e ∈ / X ) can be constructed, by using the authenticated data structure presented in the previous section. Let H(e) = j, i.e., the bucket that corresponds to element e is j. Let v0 , v1 , . . . , vl be the path from the node that corresponds to bucket j to the root of T (). We add a fictitious node v−1 that stores element e within bucket j such that v−1 , v0 , v1 , . . . , vl is the path in T () from the node to corresponds to element e to the root of T (). We consider two cases, i.e., membership and non-membership proof: • Element e is contained in the hash table. The proof is the ordered sequence π0 , π1 , . . . , πl , where πi is a tuple of a prime representative and a witness that authenticates every node of the path v−1 , v0 , . . . , vl from the element in question e to the root of the tree vl . Thus, item πi of proof Π(e) (i = 0, . . . , l) is defined as:  πi = ri (χ(vi−1 )), Wvi−1 (vi ) , (3.14) where Wvi−1 (vi ) is defined in Relation 3.8 and χ(v−1 ) = e. For simplicity, we set αi = ri (χ(vi−1 )) and βi = Wvi−1 (vi ) . (3.15) For example in Figure 3.1, the proof for an element that belongs to the bucket of node 61 a (e.g., element 2) consists of the following tuples:  r0 (2), g r0 (3)r0 (7)r0 (9) mod N0 ,   r (χ(b))r1 (χ(c))r1 (χ(d)) = r1 (χ(a)), g11 mod N1 ,   r (χ(e))r2 (χ(g))r2 (χ(p)) = r2 (χ(f )), g22 mod N2 . π0 = π1 π2 • Element e is not contained in the hash table. Let y1 , y2 , . . . , yu be the elements contained in bucket j (all different than e). First, output a membership proof (as above) for an element yi in bucket j (note that H(yi ) = H(e)). Then, and by running the extended Euclidean algorithm, output a non-membership witness πν = (Ae , Be , r0 (e), e) , (3.16) where Ae , Be and r0 (e) are defined in Relation 3.3. Note that Ae , Be are integer values proving non-membership of e in the set {y1 , y2 , . . . , yu }. We now describe the algorithm formally: Algorithm {Π(q), α(q)} ← query(q, Dh , auth(Dh ), pk): Let e = q be the queried element. If e is contained in Dh , set Π(q) = (π0 , π1 , . . . , πl ), as in Relation 3.14 and output α(q) = true. If e is not contained in Dh , output a membership proof for some other element yi in bucket j, such that H(e) = H(yi ). Then output a non-membership proof πν for e in bucket j, as defined in Relation 3.16. Set Π(q) = (Π(yi ), πν ) and output α(q) = false. Lemma 3.8 Algorithm query() of the authenticated data structure scheme RHT has O(n ) expected access complexity, without precomputed witnesses. With precomputed witnesses, algorithm query() has O(1) expected access complexity. Moreover, it outputs a proof Π(q) of O(1) expected group complexity. Proof: (a) Membership proof : Without precomputed witnesses, the construction of π0 has always O(1) expected access complexity since each bucket contains O(1) elements in expectation. Also, the construction of each element πi (i = 1, . . . , l) has O(n ) access complexity 62 due to the degree bound of the nodes in T (). Therefore the total complexity is expected O(n ). With precomputed witnesses, each πi can be “read” directly from memory with O(1) access complexity (not expected). Finally, the group complexity of Π(q) for a membership proof is O(1) (not expected), since one witness for each level of T () must be provided. (b) Non-membership proof : A non-membership proof consists of a membership proof (thus the above arguments apply here as well) plus the proof πν (see Relation 3.16). Therefore in both cases (without precomputed witnesses and with precomputed witnesses), πν turns the complexities into expected, since the complexity of Ae and Be depends on the number of the elements in the bucket in question (see observation before Section 3.1.4), which is expected O(1). This completes the proof. 2 We now formally describe the verification algorithm. The verification algorithm will take as input a proof and an answer and will either accept or reject the answer. Algorithm {accept, reject} ← verify(q, α, Π, dh , pk): Let the query q refer to element e, i.e., q = e. We distinguish two cases: 1. Membership proof : In this case it is α = true. The proof Π contains a membership proof for e, denoted with Π(e) = π0 , π1 , . . . , πl , where πi = (αi , βi ) for i = 0, . . . , l, and where αi are all primes. The algorithm outputs reject if one of the following is true: (a) h0 (α0 ) 6= e (the prime representative of element e is not correct); α i−1 (b) hi (αi ) 6= βi−1 mod Ni−1 for some 1 ≤ i ≤ l (false witness); (c) dh 6= βlαl mod Nl (final digest mismatch). 2. Non-membership proof : In this case it is α = false. We recall that in this case the proof Π contains (a) a membership proof Π(y) = π0 , π1 , . . . , πl for element y 6= e such that H(y) = H(e), where πi = (αi , βi ) for i = 0, . . . , l (αi are all primes); (b) a non-membership proof for e, denoted with πν = (A, B, r, e), where r is a prime. The algorithm outputs reject if one of the following is true: 63 (a) H(e) 6= H(y); (e and y do not belong in the same bucket); (b) The membership proof for y does not verify, i.e., it is reject ← verify(y, true, Π(y), dh , pk) ; (c) h0 (r) 6= e (the prime representative r contained in πν for element e is not correct); (d) α1A g0rB 6= g0 mod N0 (verification test for non-membership proof of e in the corresponding bucket does not succeed, see Lemma 3.2). If all the above tests are successful, the algorithm outputs accept. Lemma 3.9 Algorithm verify() of the authenticated data structure scheme RHT has O(1) expected access complexity. Proof: Processing the membership proof has O(1) access complexity, since it requires processing l = O(1) pairs of witnesses and prime representatives. Moreover, processing the non-membership proof has O(1) expected access complexity, due to the size of the nonmembership witness, that depends on the number of the elements in the bucket in question. Therefore, the expected access complexity of verify() is O(1). 2 3.2.4 Correctness and security The following lemmata describe the correctness and the security of our new construction, according to Definitions 2.4 and 2.5. The security of our scheme is based on the strong RSA assumption. Lemma 3.10 The authenticated data structure scheme RHT = {genkey, setup, update, refresh, query, verify} is correct according to Definition 2.4. Proof: Let D0 be any hash table containing n elements and having m = O(n) buckets. Fix the security parameter k and output pk = {Ni , gi , hi : i = 0, . . . , l; } and sk = {φ(Ni ) : 64 i = 0, . . . , l} by calling algorithm genkey(). Then output an authenticated data structure auth(D0 ) and the respective digest d0 , by calling algorithm setup(). Pick a polynomial number of updates—namely, pick a polynomial number of elements for insertion or deletion—and update auth(D0 ) and d0 by calling algorithm refresh(). Let Dh be the final hash table, auth(Dh ) be the produced authenticated data structure and dh be the final digest. Let e be an element that belongs (or, it should belong) to bucket j (i.e., H(e) = j). Output a proof Π(e) and an answer by calling query(). We distinguish two cases: 1. Element e is contained in the hash table. Then Π(e) is a membership proof as defined in Relation 3.14. Note that π0 contains the prime representative of e with the respective witness, therefore verify() does not reject at Item 1a. By the definitions of the accumulation values output by setup() and maintained under updates by refresh() (see Relations 3.6 and 3.7) and by the definition of proof element πi in Relation 3.14 for i = 1, . . . , l, verify() does not reject at Items 1b and 1c; 2. Element e is not contained in the hash table. Let y1 , y2 , . . . , yu be the elements in bucket j, where H(e) = j. In this case, the non-membership proof consists of (a) a membership proof πi = (αi , βi ) for an element y contained in bucket j (which verifies due to Item 1) such that H(e) = H(y) = j, therefore verify() does not reject at either Item 2a or Item 2b and, (b) a non-membership proof (Ae , Be , r0 (e), e) for element e that should belong to bucket j. Therefore verify() does not reject at Item 2c, since h0 (r0 (e)) = e. Also it does not reject at Item 2d as r (e)Be α1Ae g00 Qu ( = g0 j=1 r0 (yj ) )Ae +r0 (e)Be = g0 since, by construction it is Qu α1 = g0 j=1 r0 (yj ) mod N0 , and by Relation 3.3, Ae and Be are computed to satisfy ! u Y r0 (yj ) Ae + r0 (e)Be = 1 . j=1 mod N0 , 65 This completes the proof. 2 Lemma 3.11 The authenticated data structure scheme RHT = {genkey, setup, update, refresh, query, verify} is secure according to Definition 2.5 and under the strong RSA assumption. Proof: Let k be the security parameter. Output pk = {Ni , gi , hi : i = 0, . . . , l; } and sk = {φ(Ni ) : i = 0, . . . , l} by calling algorithm genkey(). Let Adv be a polynomiallybounded adversary. Adv picks an initial collection of n elements X , stored in hash table D0 . Adv outputs an authenticated data structure auth(D0 ), by calling algorithm setup() through oracle access. Then Adv picks a polynomial number of updates—namely, he picks a polynomial number of elements for insertion or deletion. Let Dh be the final hash table, let the updated final element collection be X , and let dh be the final digest as produced by the adversary through oracle access to algorithm update(). We will compute the probability that check() rejects, while verify() accepts, as required by Definition 2.5. We distinguish two cases: 1. Membership proof : The adversary outputs a membership proof Π(e) = (π0 , π1 , . . . , πl ) / X (thus, a (l = d 1 e) where πi = (αi , βi ) (see algorithm query()) for an element e ∈ proof for an incorrect answer). Let v0 , v1 , . . . , vl be a path of nodes in T () from the bucket referring to e to the root of the tree. We define now the following events, related to the choice of the proof above made by the adversary. Our goal will be to the express the probability that verify(e, true, Π(e), dh , pk) accepts and e ∈ / X as a function of the following events. Note that dh is the correct digest of the authenticated data structure: (a) E0,0 : The value e and α0 picked by Adv are such that e ∈ / X , α0 is prime and h0 (α0 ) = e; (b) Ej : For j = 1, . . . , l, the values αj , αj−1 and βj−1 picked by Adv are such that both αj and αj−1 are primes and α j−1 hj (αj ) = βj−1 mod Nj−1 for all 1 ≤ j ≤ l . 66 This event can be partitioned into two mutually exclusive events, i.e., Ej = Ej,0 ∪ Ej,1 such that • Ej,0 : Value hj (αj ) is not the correctly formed digest (i.e., an accumulation of the digests of its children) of some node vj−1 ∈ N (vj ), as defined in Relation 3.7; • Ej,1 : Value hj (αj ) is the correctly formed digest of a node vj−1 ∈ N (vj ), as defined in Relation 3.7. (c) El+1,1 : The values αl and βl picked by Adv are such that βlαl = dh mod Nl . The probability that verify() accepts, while e ∈ / X is the probability Pr[E0,0 ∩ E1 ∩ E2 ∩ . . . ∩ El+1,1 ] = Pr[E0,0 ∩ (E1,0 ∪ E1,1 ) ∩ (E2,0 ∪ E2,1 ) ∩ . . . ∩ El+1,1 ] ≤ Pr[E1,1 |E0,0 ] + Pr[E2,1 |E1,0 ] + Pr[E3,1 |E2,0 ] + . . . + Pr[El+1,1 |El,0 ] l+1 X = Pr[E1,1 |E0,0 ] + Pr[Ej,1 |Ej−1,0 ] . (3.17) j=2 First we examine the event E1,1 |E0,0 . This event implies that the adversary has found a value e ∈ / X , a prime α0 such that h0 (α0 ) = e and a value β0 such that Q β0α0 = g0 t=1,...,l0 r0 (xt ) mod N0 , where x1 , x2 , . . . , xl0 is a subset of the set X . Since e ∈ / X , it is e ∈ / {x1 , x2 , . . . , xl0 }. Also, since every prime representative is mapped to a unique element through function h0 , we conclude that it should be α0 ∈ / {r0 (x1 ), r0 (x2 ), . . . , r0 (xl0 )}. By Lemma 3.2 and Assumption 3.1, this probability is neg(k). Therefore Pr[E1,1 |E0,0 ] ≤ neg(k). For the remaining events Ej,1 |Ej−1,0 (2 ≤ j ≤ l + 1), we have: 67 • By the one-to-one property of the function hj−1 (.), Ej−1,0 implies that value αj−1 is not the prime representative of the correctly formed digest of some node vj−2 ∈ N (vj−1 ), as defined in Relation 3.7, namely that αj−1 ∈ / {rj−1 (χ(vt )) : vt ∈ N (vj−1 )} ; • However, the event Ej,1 implies that (1) digest hj (αj ) (for j = l + 1 this is just dh ) is the correctly formed digest of node vj−1 ; and (2) αj−1 βj−1 Q v ∈N (vj−1 ) rj−1 (χ(vt )) = gj−1t mod Nj−1 . where rj−1 (χ(vt )) are the prime representatives of correctly formed digests of the set of neighbors of vj−1 . Since αj−1 ∈ / {rj−1 (χ(vt )) : vt ∈ N (vj−1 )}, by Lemma 3.2 and Assumption 3.1, this probability is neg(k). Therefore for all j = 1, . . . , l+1, Pr[Ej,1 |Ej−1,0 ] is neg(k). Since l = O(1), the total probability is also neg(k). This concludes the proof for the membership case. 2. Non-membership proof : For this case, we define the events: (a) B0 : Adv finds e ∈ X and y such that H(e) = H(y) = j; (b) B1 : Adv finds a membership proof for y, namely the proof π = π0 , π1 , . . . , πl where πi = (αi , βi ) (where αi are prime numbers) and accept ← verify(y, true, π, dh , pk); (c) B2 : Adv finds α1 , a non-membership proof (A, B, r, e) for e such that r is a prime number, h0 (r) = e and α1A g0rB = g0 mod N0 . This event is partitioned into two events: Q i. B20 : α1 6= acc(Lj ) = g0 Q ii. B21 : α1 = acc(Lj ) = g0 x∈Lj x∈Lj r0 (x) r0 (x) mod N0 ; mod N0 . We need to compute the probability Pr[B0 ∩ B1 ∩ B2 ] = Pr[B0 ∩ B1 ∩ (B20 ∪ B21 )] ≤ Pr[B20 |B1 |B0 ] + Pr[B21 |B0 ]. By Lemma 3.2 and Assumption 3.1, it is Pr[B21 |B0 ] ≤ 68 ν(k), where ν(k) is the appropriate negligible function. Note also that we can express B20 |B1 |B0 as a function of the events E in the membership proof case. Specifically, the event B20 |B1 |B0 implies the event E1,0 ∩ E2 ∩ E3 ∩ . . . ∩ El+1,1 , the probability of which, by following the same logic as in Relations 3.17 is bounded by neg(k). This concludes the proof for both the membership and the non-membership cases. 2 We can now present the main result of this section. Theorem 3.2 Let k be the security parameter and 0 <  < 1. Then there exists a publiclyverifiable authenticated data structure scheme RHT = {genkey, setup, update, refresh, query, verify} for a data structure scheme defined for dynamic hash table D storing n elements such that: 1. It is correct according to Definition 2.4 and secure according to Definition 2.5 and under the strong RSA assumption; 2. The access complexity of setup() is O(n), outputting an authenticated data structure auth(D) of O(n) group complexity; 3. The expected amortized access complexity of update() is O(1), outputting update information upd of O(1) group complexity; 4. The expected amortized access complexity of refresh() is O(n log n) (or O(1)); 5. The expected access complexity of query() is O(1) (or O(n )), outputting a proof Π(q) for a query q of O(1) expected group complexity; 6. The expected access complexity of verify() is O(1). Proof: This result follows directly from Lemmata 3.4, 3.5, 3.7, 3.8, 3.9, 3.10 and 3.11. The complexities in the brackets (O(1) for refresh() and O(n ) for query()) refer to the case when no precomputed witnesses are used. Note that the presented scheme is publicly verifiable since verify() does not take the secret key as an input. 2 69 3.2.5 A more practical scheme The construction we have presented (RHT authenticated data structure scheme) uses different RSA moduli for each level of the tree and each new RSA modulus has a bit-length that is three times longer than the bit-length of the previous-level RSA modulus. Therefore, computations corresponding to higher levels in the accumulation tree are more expensive, since they involve modular arithmetic operations over longer elements. This increase in the lengths of the RSA moduli is due to the need to compute, for the elements stored at every level in the tree, prime representatives of size that is three times as large as the size of the elements (see Lemma 3.1). Although from a theoretical point of view this is not a concern as the number of the levels of the tree is constant (i.e., 1/), from a practical point of view, this can be prohibitive for efficiently implementing our schemes. To overcome this complexity overhead, we want to use the same RSA modulus for each level of the tree, and to achieve this, we present a heuristic inspired by a similar method originally used in the work of Baric and Pfitzmann [10]. Instead of using two-universal hash functions to map (general) integers to primes of increased size, the idea is to employ random oracles [12] for consistently computing primes of relatively small size. In particular, given a k-bit integer x, instead of mapping it to a 3k-bit prime, we can map it to the value 2t 2b g(x) + d, where g(x) is the output of length b of a random oracle (which in practice is the output of a cryptographic hash function) at the end of which we append b zeros so that we make this number large enough, t is a value that equals to the number of bits we are shifting 2b g(x) to the left, and d = 1, 3, . . . , 2t − 1 is a number we are adding so that 2t 2b g(x) + d is a prime. Note that we require that t is related to b according to Relation 3.18 of Theorem 3.3. In the following, we denote by q(x) a prime representative of x computed by the above procedure, i.e., the output of a procedure that transforms a k-bit integer into a k 0 -bit prime, where k 0 < k. Note that the above procedure (i.e., the computation of q(x)) cannot map two different integers to the same prime. This can be derived by the random oracle property, namely that for x1 6= x2 , w.h.p. it is g(x1 ) 6= g(x2 ). This implies that the intervals 70 [2t 2b g(x1 ), 2t 2b g(x1 ) + 2t − 1] and [2t 2b g(x2 ), 2t 2b g(x2 ) + 2t − 1] are disjoint. Finally we show that we can make sure that with high probability we will always be able to find a prime within the specified interval. Theorem 3.3 Let x be a k-bit integer and let a = 2b g(x) be the output of a b-bit random oracle with b zeros appended at the end. The interval [2t a, 2t a + 2t − 1] contains a prime with probability at least 1 − 2−b provided j b ≤ log(1 + p 2t + 4e2t −1 ) k −1 . (3.18) Proof: By the prime distribution theorem we have that the number of primes less than n is approximately n . ln n Therefore, we want to compute the probability " #   t t 2t a e2 −1 2 a + 2t − 1 − ≥ 1 = Pr a ≤ t , Pr ln(2t a + 2t − 1) ln(2t a) 2 by assuming ln(2t a + 2t − 1) ' ln(2t a) since a > 2b >> 2t . By the random oracle property we have that " t e2 −1 Pr a ≤ t 2 # " t e2 −1 = Pr 2b g(x) ≤ t 2 # t e2 −1 1 = b+t b . 2 2 Note that √ √ t 1 1 − 2t + 4e2t −1 1 + 2t + 4e2t −1 e2 −1 1 b ≥1− b ⇔ ≤2 ≤ , 2b+t 2b 2 2 2 j k √ which gives b ≤ log(1 + 2t + 4e2t −1 ) − 1 since b is a positive integer. This completes the proof. 2 Using Theorem 3.3, we can pick the length of the output of the random oracle to ensure hitting a prime with high probability. For example, for t = 9 we get b ≤ 368, which is true for most practical hash functions used today (e.g., SHA-256). Using the above method, we can still accumulate primes in the exponent but this time without having to increase the size of the RSA moduli at any level of the tree. The only conditions we need in order to securely use the RSA accumulator are: 71 1. the safe accumulation of primes that map to unique integers (i.e., each accumulated prime can only represent one integer), and 2. the bit-length of accumulated primes is smaller than the bit-length of the used RSA modulus. Thus, we can apply our new procedure for computing prime representatives to all of the constructions in Section 3.2 with one important efficiency improvement: the same RSA modulus and exponentiation bases are used at all levels of the accumulation tree. With this heuristic, we overall get the same security and complexity results as before, but now we have a more practical accumulator with security that is now based on both the strong RSA and the random oracle assumptions. 3.2.6 Protocols Three-party protocol. By using Theorem 2.1 we can easily derive the following corollary that describes the use of the authenticated data structure scheme RHT of Theorem 3.2 by Protocol 2.1. Corollary 3.1 Let k be the security parameter and assume that the strong RSA assumption holds. Then there exists a three-party authenticated data structures protocol (see Protocol 2.1) for verifying (non)-membership queries q on a dynamic hash table storing n elements such that: 1. The setup at the source has O(n) access complexity; 2. The update at the source has O(1) expected amortized access complexity; 3. The space needed at the source has O(n) group complexity; 4. The communication between the source and the server has O(1) group complexity; 72 5. The update at the server has O(n log n) (or O(1)) expected amortized access complexity; 6. The query at the server has O(1) (or O(n )) expected access complexity; 7. The space needed at the server has O(n) group complexity; 8. The communication between the server and the client has O(1) expected group complexity; 9. The verification at the client has O(1) expected access complexity; 10. For a query q sent by the client to the server at any time (even after updates), let α be an answer and let π be a proof returned by the server. With probability Ω(1 − neg(k)), the client accepts the answer α if and only if α is correct. Two-party protocol. In order to be able to use the authenticated data structure scheme RHT of Theorem 3.2 in a black-box way with Theorem 2.2—and derive a two-party authenticated data structures protocol, we have to ensure that Assumption 2.1 holds for the authenticated data structure scheme RHT : Lemma 3.12 Assumption 2.1 is true for the authenticated data structure scheme RHT . Moreover, for every update u, |Qu | has O(1) amortized complexity. Proof: Let an update u refer to element e, i.e., either insert element e to the hash table or delete element e from the hash table. We distinguish two cases: 1. The hash table is not rebuilt (see Definition 3.2) due to update u. In this case, the respective set of queries Qu required for Assumption 2.1 simply contains one query for element e, i.e., qu = e. Let {Π(e), α(e)} ← query(e, Dh , auth(Dh ), pk). We now describe function z(.) from Assumption 2.1. Let qu = e. Function z(.) first computes δu (Dh ) as H(e) = j, since update() needs to access bucket j in order to do any 73 operation on element e, with H(e) = j 4 . To compute δu (auth(Dh )), z(.) processes the proof Π(qu ) as follows. We recall that both a membership and non-membership proof contains the ordered sequence π0 , π1 , . . . , πl , where πi is a tuple of a prime representative and a witness that authenticates every digest of the path v0 , v1 , . . . , vl from the bucket in question H(e) = j to the root of the tree vl . Thus, item πi of proof Π(e) (i = 0, 1, . . . , l) is defined as (see Relation 3.14):  πi = ri (χ(vi−1 )), Wvi−1 (vi ) . Note now that update() needs to access χ(vi ), for i = 0, 1, . . . , l, in order to perform the update (see Relation 3.10). All these values are easily computed from πi : Just set r (χ(v )) χ(vi ) = Wvii−1 (vi−1 —see Relation 3.7. Therefore the function z(.) computes such expoi) nentiations and outputs δu (auth(Dh )) with O(1) complexity, same with the complexity of algorithm verify(). 2. The hash table is rebuilt (see Definition 3.2) due to update u. Then the set of queries Qu consists of queries for all the elements contained in the hash table, therefore its size is O(n), where n is the number of the elements stored in the table. In this case z(.) is just a call to algorithm setup(). Because the hash table has to be rebuilt, and in this case it is |Qu | = O(n), it follows (with a similar analysis as in Lemma 3.5) that |Qu | has O(1) amortized complexity. This completes the proof. 2 By Theorems 2.2 and 3.2 and Lemma 3.12, we can now state the final result for the two-party model: Corollary 3.2 Let k be the security parameter and assume that the strong RSA assumption holds. Then there exists a two-party authenticated data structures protocol (see Protocol 2.2) 4 We recall that update() does not update the bucket j itself, but receives the updated bucket from update()—see Definition 2.3. 74 for verifying (non)-membership queries q on a dynamic hash table storing n elements such that: 1. When precomputed witnesses are used, the protocol is non-interactive; Otherwise, it requires one round of interaction during updates; 2. The setup at the client has O(n) access complexity; 3. The update at the client has O(1) expected amortized access complexity; 4. The verification at the client has O(1) expected access complexity; 5. The space needed at the client has O(1) group complexity; 6. The communication between the client and the server has O(1) expected amortized group complexity during updates and O(1) expected group complexity during queries; 7. The update at the server has O(n log n) (or O(n )) expected amortized access complexity; 8. The query at the server has O(1) (or O(n )) expected access complexity; 9. The space needed at the server has O(n) group complexity; 10. For a query q sent by the client to the server at any time (even after updates), let α be an answer and let π be a proof returned by the server. With probability Ω(1 − neg(k)), the client accepts the answer α if and only if α is correct. 3.3 Scheme based on the bilinear-map accumulator In this section we use the bilinear-map accumulator and present a new authenticated data structure scheme BHT = {genkey, setup, update, refresh, query, verify} for dynamic hash tables. We use exactly the same methodology as the one used in Section 3.2, that is, nested 75 invocations of accumulators in a constant-depth tree, to overall obtain similar complexity and security results with the solution presented before. Accordingly, we use the same structure in presenting and proving our results. Note however that there are significant differences (both in complexity and cryptography) that are imposed by the use of the different cryptographic primitive. For example, the underlying algebraic groups used are fundamentally different (the RSA accumulator is using ZN while the bilinear-map accumulator is using groups defined over elliptic curves). This imposes certain differences in the complexity of some algorithms, which will be discussed in the following sections. We begin with algorithms genkey() and setup(). Again, the underlying (plain) data structure is a dynamic hash table T(X ) storing n elements X = {x1 , x2 , . . . , xn }. As in the RSA accumulator case, the elements are distributed into m buckets L1 , L2 , . . . , Lm (using a two-universal hash function H), where m = O(n). Algorithm {sk, pk} ← genkey(1k ): The algorithm chooses a k-bit prime p, an exponentiation base g that is a generator of a multiplicative cyclic group G of prime order p, for which there is a bilinear map e(., .) : G × G → G 5 . All the above are chosen uniformly at random as indicated by Assumption 3.2, basically the algorithm has to generate the tuple t = (p, G, G, e, g). Then it randomly picks a number s ∈ Z∗p (s is the trapdoor). An upper bound q of the total number of elements that will be accumulated is decided and the 2 q algorithm also computes the elements of G g s , g s , . . . , g s . Finally, a function that outputs the bit-description of the elements in G, i.e., the function h : G → Zp is used6 . Note that since G has exactly p elements, the function maps each element in G to an integer in Zp . In order not to overload the notation, we assume that when the input to the function h(.) is an element x ∈ Zp , it just outputs x. The algorithm outputs s ∈ Z∗p as sk and everything else as pk. 5 The generator g is used as the exponentiation base in all the levels of the accumulation tree T (). 6 In this way we make sure that the output accumulated value at some level can be used as input to the next level of accumulation, since we can only accumulate elements of Zp and not elements of G 76 Algorithm {auth(D0 ), d0 } ← setup(D0 , sk, pk): The algorithm builds the accumulation tree T () on top of the m buckets L1 , L2 , . . . , Lm . For every leaf node v in tree T () that lies at level 0 and corresponds to a bucket Lj , the algorithm sets χ(v) = g Q x∈Lj (x+s) ∈ G, (3.19) while for every non-leaf node v in T () that lies at level 1 ≤ i ≤ l, the algorithm sets: χ(v) = g Q u∈N (v) (h(χ(u))+s) ∈ G, (3.20) where h(χ(u)) is an element in Zp , computed using function h(). The authenticated data structure auth(D0 ) output by the algorithm consists of the following components: 1. The accumulation tree T (); 2. For every node v ∈ T () at level i, the accumulation values χ(v). Let r be the root of the tree T . The algorithm also outputs d0 = χ(r), i.e., the digest of the authenticated data structure is the χ(.) value of the root of the accumulation tree. Precomputed witnesses. The precomputed witnesses in this case are defined as follows: For every j ∈ N (v) we store at node v the witness Wj(v) = g Q u∈N (v)−{j} (h(χ(u))+s) . (3.21) When the construction with the precomputed witnesses is used, auth(D0 ) also includes Wj(v) , for all v ∈ T () and all j ∈ N (v). Lemma 3.13 Algorithm setup() of the authenticated data structure scheme BHT has O(n) access complexity both with and without precomputed witnesses. Moreover, the authenticated data structure auth(D0 ) output by setup() has always O(n) group complexity. Proof: Same with Lemma 3.4 with the difference that the efficient computation of the exponent expressions is now feasible since sk contains the trapdoor s. 2 We continue with the algorithms used for updates: 77 Algorithm {Dh+1 , auth(Dh+1 ), dh+1 , upd} ← update(u, Dh , auth(Dh ), dh , sk, pk): Let m be the current number of buckets of Dh and n be the number of elements stored in Dh , after the update has been performed. We distinguish two cases: Case 1. m 4 < n < m: In this case there is no need to rebuild the table and the update is performed as follows: Suppose the update we consider is the insertion of an element e ∈ Zp . The algorithm computes the bucket j = H(e) (see Relation 3.1) and inserts e in bucket j. Let v0 be the node of T () referring to bucket j. Let v0 , v1 , . . . , vl be the path in T () from node v0 to the root of the tree. The algorithm initially sets χ0 (v0 ) = χ(v0 )e+s , i.e., it updates the accumulation value that corresponds to the updated bucket. Note that if the update we consider is the deletion of an element e, the algorithm sets −1 χ0 (v0 ) = χ(v0 )(e+s) . (3.22) Subsequently, for j = 1, . . . , l the algorithm sets 0 −1 χ0 (vj ) = χ(vj )(h(χ (vj−1 ))+s)(h(χ(vj−1 ))+s) , (3.23) where χ(vj−1 ) is the previous accumulation value and χ0 (vj−1 ) is the updated accumulation value. All these values are stored by the algorithm after they have been computed. The algorithm also outputs the new accumulation values χ0 (vj−1 ) (i = 1, . . . , l) as the information upd along the path from the updated bucket to the root of the tree. Information upd also includes e and χ0 (vl ). Also it sets dh+1 = χ0 (vl ), i.e., the updated digest is the updated χ(.) value of the root of T (). Finally the new authenticated data structure auth(Dh+1 ) is computed as follows. Let auth(Dh ) be the previous authenticated data structure that is input to the algorithm: Overwrite the values χ(vj−1 ) (j = 1, . . . , l) with the new values χ0 (vj−1 ) (j = 1, . . . , l) and output the updated structure. The behavior of the algorithm in the precomputed witnesses case is the same, with the difference that upd = Ø. Case 2. m = m 4 or n = m: In this case the hash table is rebuilt according to Definition 3.2: 78 If n = m , 4 then the algorithm builds a data structure Dh+1 with m/2 buckets. Otherwise, i.e., when n = m, the algorithm builds a data structure Dh+1 with 2m buckets. Subsequently, it outputs auth(Dh+1 ) and dh+1 by calling algorithm setup(Dh+1 , sk, pk). However, instead of setting upd = Ø, it sets upd = {auth(Dh+1 ), dh+1 }. Lemma 3.14 By using the rebuilding policy of Definition 3.2, algorithm update() of the authenticated data structure scheme BHT has O(1) expected amortized access complexity. Moreover, the update information upd output by update() has O(1) amortized group complexity. Proof: Same as in Lemma 3.5. However, the group complexity of upd is amortized, because when the hash table is rebuilt, it contains the new authenticated data structure (of group complexity O(n)). 2 Before presenting the remaining algorithms we provide some necessary complexity results that we are going to need. The following result is derived by using an FFT algorithm (e.g., see Preparata and Sarwate [96]) that computes the DFT in a finite field (e.g., Zp ) for arbitrary n and with O(n log n) field operations. Note that there is no requirement for existence of an n-th root of unity in Zp for the algorithm to work. Lemma 3.15 (Polynomial interpolation with FFT [96]) Let Qn i=1 (s+xi ) = Pn i=0 ai si be a degree-n polynomial. The coefficients an , an−1 , . . . , a0 can be computed with O(n log n) complexity, given x1 , x2 , . . . , xn . Algorithm {Dh+1 , auth(Dh+1 ), dh+1 } ← refresh(u, Dh , auth(Dh ), dh , upd, pk): Let m be the current number of buckets of Dh and n be the number of elements stored in Dh , after the update has been performed. We distinguish two cases: Case 1. m 4 < n < m: Suppose the update is an insertion of element e. The algorithm computes the bucket j = H(e) (see Relation 3.1) and inserts e in bucket j. Let v0 be the node of T () referring to bucket j. Let v0 , v1 , . . . , vl be the path in T () from node v0 to the 79 root of the tree. The algorithm, for j = 0, . . . , l, sets χ(vj ) = χ0 (vj ) , i.e., it updates the accumulation values that correspond to the updated path by using the information upd7 . Finally it outputs the updated hash table as Dh+1 , the updated accumulation values χ(vj ) (along with the ones that belong to the nodes that are not updated) as auth(Dh+1 ) and χ0 (vl ) (contained in upd) as dh+1 . Precomputed witnesses. When precomputed witnesses are used, the algorithm should update Wj(v) for v = v0 , v1 , . . . , vl and for all j ∈ N (v) (see Relation 3.8). To achieve that efficiently, the following result is used (derived in part from [83]): Lemma 3.16 (Witnesses update formulas) Suppose we are given the elements collection X = {x1 , x2 , . . . , xn }. Let Wi be the witness of xi , i.e., Wi = g Q j6=i (xj +s) . Then the following hold: 1. (Element addition) If X 0 = X ∪ {xn+1 }, then for all i = 1, . . . , n + 1 it is x Wi0 = acc(X )Wi n+1 −xi . (3.24) 2. (Element deletion) If X 0 = X − {xj }, then for all i 6= j it is Wi0  = Wi Wj x 1 j −xi . (3.25) 3. (Element modification) If X 0 = X − {xj } ∪ {x0j }, then for all i 6= j it is 0 Wi0 = Wj  Wi Wj i  xxj −x −x j i . (3.26) For i = j, it is Wi0 = Wi . 7 Note that information upd is not required for refresh() to perform this task. Algorithm refresh() uses upd for efficiency. Namely, algorithm refresh() could compute the updated values χ(vj ) by doing polynomial interpolation, which would have O(n log n) complexity (see Lemma 3.15). 80 Proof: Relations 3.24 and 3.25 are given in the initial work of Nguyen [83]. Relation 3.26 is derived as a corollary of Relations 3.24 and 3.25. Indeed, for all i 6= j: 0 0  Wj Wi Wj −xi  xxj −x j i = g Q g x∈X −{xj } (x+s) g Q Q x∈X −{xi } (x+s) −xi ! xxj −x j i x∈X −{xj } (x+s) 0 = g = g = g = g = g Q x∈X −{xj Q x∈X −{xj } (x+s) Q x∈X −{xj } (x+s) s Q Q   xj −xi Q (xj −xi ) x∈X −{x ,x } (x+s) xj −xi j i g } (x+s) g g (x0j −xi ) −xi x∈X −{xj ,xi } (x+s) g Q x∈X −{xj ,xi } (x+s) Q x∈X −{xj ,xi } (x+s) x0j Q g x0j Q x∈X −{xj ,xi } (x+s) x∈X −{xj ,xi } (x+s) x∈X 0 −{xi } (x+s) = Wi0 . For i = j, the witness Wj does not change since, by definition, Wj is not a function of the value of j (xj or x0j ). This completes the proof. 2 Corollary 3.3 (Updating precomputed witnesses) Given the collection of elements X = {x1 , x2 , . . . , xn }, and the witnesses Wi for all i = 1, . . . , n, computing the updated witnesses Wi0 of either X ∪ {xn+1 } or X − {xj } or X − {xj } ∪ {x0j }, without the knowledge of the trapdoor s, has O(n) complexity. Algorithm refresh() computes the updated witnesses as follows: Since the previous witnesses Wi are stored at each node vi , for i = 1, . . . , l, the algorithm uses Relations 3.24 and 3.25 to update the witnesses within the bucket of the update (depending on whether there is an addition or a deletion of an element) and Relation 3.26 to update the witnesses that correspond to every internal node of the tree. Specifically, for an internal node v, that has children v1 , v2 , . . . , vt , suppose the accumulation value χ(vj ) of vj is modified. Then the element collections X and X 0 used in Formula 3.26 are the following: X = {h(χ(v1 )), h(χ(v2 )), . . . , h(χ(vt ))} , 81 and X 0 = {h(χ(v1 )), h(χ(v2 )), . . . , h(χ(vj−1 )), h(χ0 (vj )), h(χ(vj+1 )), . . . , h(χ(vt ))} . Case 2. m = If n = m , 4 m 4 or n = m: In this case the hash table is rebuilt according to Definition 3.2: then the algorithm builds a data structure Dh+1 with m/2 buckets. Otherwise, i.e., when n = m, the algorithm builds a data structure Dh+1 with 2m buckets. Subsequently, it outputs auth(Dh+1 ) and dh+1 from information upd output by update(). We recall that upd includes the new witnesses. By using the same amortized analysis as in Lemma 3.7 (note now that the work that refresh() does when rebuilding the hash table is O(m)—copying information from upd) and not O(m log m)) but Corollary 3.3 instead of Lemma 3.6 in the proof, we can derive the following result: Lemma 3.17 By using the rebuilding policy of Definition 3.2, algorithm refresh() of the authenticated data structure scheme BHT has O(1) expected amortized access complexity, without precomputed witnesses. With precomputed witnesses, algorithm refresh() has O(n ) expected amortized access complexity. 3.3.1 Queries and verification We show now how a proof for an element e ∈ X (or an element e ∈ / X ) can be constructed. As in the RSA accumulator case, let H(e) = j (bucket assignment for e) and let v0 , v1 , . . . , vl be the path from the node that corresponds to bucket j to the root of T (). We recall v−1 is a fictitious node that stores element e within bucket j such that v−1 , v0 , v1 , . . . , vl is the path in T () from the node that corresponds to element e to the root of T (). We consider two cases, i.e., membership and non-membership proof: • Element e is contained in the hash table. The proof is the ordered sequence π0 , π1 , . . . , πl , where πi is a tuple of an accumulation value χ() and a witness that authenticates every 82 node of the path v−1 , v0 , . . . , vl from the element in question e to the root of the tree vl . Thus, item πi of proof Π(e) (i = 0, . . . , l) is defined as:  πi = χ(vi−1 )), Wvi−1 (vi ) , (3.27) where Wvi−1 (vi ) is defined in Relation 3.21. For simplicity, we set αi = χ(vi−1 ) (note that χ(v−1 ) = e) and βi = Wvi−1 (vi ) . (3.28) For example in Figure 3.1, the proof for an element that belongs to bucket of node a (e.g., element 2) consists of the following tuples: π0 =  2, g (s+3)(s+7)(s+9) , π1 =  χ(a), g (h(χ(b))+s)(h(χ(c))+s)(h(χ(d))+s) ,  χ(f ), g (h(χ(e))+s)(h(χ(g))+s)(h(χ(p))+s) . π2 = • Element e is not contained in the hash table. Let y1 , y2 , . . . , yu be the elements contained in bucket j (all different than e). First, output a membership proof (as above) for an element yi in bucket j (note that H(yi ) = H(e)). Then, and by running the extended Euclidean algorithm for polynomials, output a non-membership witness πν = (Ae , Be , e) , (3.29) where Ae , Be are elements in G defined in Relation 3.5. Note that Ae , Be have group complexity O(1) (and not expected O(1) as in the RSA accumulator case—see Relation 3.3—since they consist of just one group element each) and they are used to prove non-membership of e in the set {y1 , y2 , . . . , yu }. We now describe the algorithm formally: Algorithm {Π(q), α(q)} ← query(q, Dh , auth(Dh ), pk): Let e = q be the queried element. If e is contained in Dh , set Π(q) = (π0 , π1 , . . . , πl ), as in Relation 3.27 and output α(q) = true. 83 If e is not contained in Dh , output a membership proof for some other element yi in bucket j, such that H(e) = H(yi ). Then output a non-membership proof πν for e in bucket j, as defined in Relation 3.29. Set Π(q) = (Π(yi ), πν ) and α(q) = false. Lemma 3.18 Without precomputed witnesses, algorithm query() of the authenticated data structure scheme BHT has O(n log n) expected access complexity. With precomputed witnesses, algorithm query() has O(1) expected access complexity. Moreover, it outputs a proof Π(q) of O(1) group complexity. Proof: The proof is the same with Lemma 3.8, but with the following differences: 1. Without precomputed witnesses, a witness cannot be constructed with direct exponentiation, since the trapdoor s is not known. It is constructed with polynomial interpolation as follows: Suppose the witness is g (y1 +s)(y2 +s)...(yt +s) (where t = O(n )). Compute a0 , a1 , . . . , at by using Lemma 3.15 (O(n log n) complexity). Then output the witness as  2 a2  t at g a0 × (g s )a1 × g s × . . . × gs , t where g, g s , . . . , g s are contained in the public key. The final task has O(n ) access complexity. 2. The proof has group complexity O(1) and not expected O(1), due the compactness of the non-membership proof in the bilinear-map accumulator construction (see Relation 3.5). This completes the proof. 2 Algorithm {accept, reject} ← verify(q, α, Π, dh , pk): Let the query q refer to element e, i.e., q = e. We distinguish two cases: 1. Membership proof : In this case it is α = true. The proof Π should contain Π(e) = 84 π0 , π1 , . . . , πl , i.e., the membership proof for element e, where πi = (αi , βi ). The algorithm outputs reject if one of the following is true (note that the verification algorithm is using the bilinear map function e(., .)): (a) α0 6= e (element α0 is not correct);  (b) e (αi , g) 6= e βi−1 , g s g h(αi−1 ) for some 1 ≤ i ≤ l (false witness);  (c) e(dh , g) 6= e βl , g s g h(αl ) (final digest mismatch). 2. Non-membership proof : In this case it is α = false. The proof Π in this case contains Π(y) = π0 , π1 , . . . , πl , i.e., the membership proof for an element y 6= e, where πi = (αi , βi ) for i = 0, . . . , l. It also contains πν = (A, B, r), the non-membership proof for e. The algorithm outputs reject if one of the following is true: (a) H(e) 6= H(y); (e and y do not belong in the same bucket); (b) The membership proof for y does not verify, i.e., it is reject ← verify(y, true, Π(y), dh , pk) ; (c) r 6= e (the data element contained in πν for element e is not correct); (d) e (α1 , A) e (g s g r , B) 6= e(g, g) (verification test for non-membership proof of e does not succeed, see Lemma 3.3). If all the above tests are successful, the algorithm outputs accept. Lemma 3.19 Algorithm verify() of the authenticated data structure scheme BHT has O(1) access complexity. Proof: Same as in Lemma 3.9, with the difference that the complexity is not expected any more, due to the compactness of the non-membership proof. 2 We finally give the results for correctness and security of the scheme BHT : 85 Lemma 3.20 The authenticated data structure scheme BHT = {genkey, setup, update, refresh, query, verify} is correct according to Definition 2.4. Proof: The proof follows the same logic with the proof of Lemma 3.10. As such, we only show the correctness for the non-membership proof case. Let y1 , y2 , . . . , yu be the elements contained in the bucket where e should belong. The non-membership proof, as computed by query(), that is needed for verification, is (Ae , Be , e). Therefore verify() does not reject at Item 2c, since r = e. Also it does not reject at Item 2d since  Qu  e (α1 , Ae ) e (g s g e , Be ) = e g j=1 (yj +s) , Ae e (g s g e , Be ) = e(g, g) , since, by Relation 3.5, Ae = g α(s) and Be = g β(s) such that i (y + s) α(s)+(e+s)β(s) = j j=1 hQ u 1. 2 Lemma 3.21 The authenticated data structure scheme BHT = {genkey, setup, update, refresh, query, verify} is secure according to Definition 2.5 and under the q-strong DiffieHellman assumption. Proof: The proof follows exactly the same logic with the proof of Lemma 3.11: Let k be the 2 q security parameter. Output pk = {h(.), (p, G, G, e, g), {g s , g s , . . . , g s }, } and sk = s ∈ Z∗p by calling algorithm genkey(). Let Adv be a polynomially-bounded adversary. Adv picks an initial collection of n elements X , stored in hash table D0 . Adv outputs an authenticated data structure auth(D0 ), by calling algorithm setup() through oracle access. Then Adv picks a polynomial number of updates—namely, he picks a polynomial number of elements for insertion or deletion. Let Dh be the final hash table, let the updated final element collection be X , and let dh be the final digest as produced by the adversary through oracle access to algorithm update(). We will compute the probability that check() rejects, while verify() accepts, as required by Definition 2.5. For the case of a membership proof, the adversary Adv outputs an incorrect answer e ∈ /X and also a proof Π(e) = (π0 , π1 , . . . , πl ) (l = d 1 e) where πi = (αi , βi ) (see algorithm query()). 86 Let v0 , v1 , . . . , vl be a path of nodes in T () from the bucket referring to e to the root of the tree. We define now the following events, related to the choice of the proof above made by the adversary. Our goal will be to the express the probability that verify(e, true, Π(e), dh , pk) accepts and e ∈ / X as a function of the following events. Note that dh is the correct digest of the authenticated data structure: 1. E0,0 : The value α0 picked by Adv are such that α0 = e ∈ / X; 2. Ej : For j = 1, . . . , l, the values αj , αj−1 and βj−1 picked by Adv are such that  e (αj , g) = e βj−1 , g s g h(αj−1 ) for all 1 ≤ j ≤ l . This event can be partitioned into two mutually exclusive events, i.e., Ej = Ej,0 ∪ Ej,1 such that • Ej,0 : Value αj is not the correctly formed digest (i.e., an accumulation of the digests of its children) of some node vj−1 ∈ N (vj ), as defined in Relation 3.20; • Ej,1 : Value αj is the correctly formed digest of a node vj−1 ∈ N (vj ), as defined in Relation 3.20. 3. El+1,1 : The values αl and βl picked by Adv are such that  e βl , g s g h(αl ) = e(dh , g) . The probability that verify() accepts, while e ∈ / X is the probability Pr[E0,0 ∩ E1 ∩ E2 ∩ . . . ∩ El+1,1 ] = Pr[E0,0 ∩ (E1,0 ∪ E1,1 ) ∩ (E2,0 ∪ E2,1 ) ∩ . . . ∩ El+1,1 ] ≤ Pr[E1,1 |E0,0 ] + Pr[E2,1 |E1,0 ] + Pr[E3,1 |E2,0 ] + . . . + Pr[El+1,1 |El,0 ] l+1 X = Pr[E1,1 |E0,0 ] + Pr[Ej,1 |Ej−1,0 ] . j=2 (3.30) 87 First we examine the event E1,1 |E0,0 . This event implies that the adversary has found a value α0 = e ∈ / X and a value β0 such that Q   e β0 , g s g h(α0 ) = e g t=1,...,l0 (s+xt ) , g , where x1 , x2 , . . . , xl0 is a subset of the set X . Since e = h(α0 ) ∈ / X , it is e ∈ / {x1 , x2 , . . . , xl0 }. By Lemma 3.3 and Assumption 3.2, this probability is neg(k). Therefore Pr[E1,1 |E0,0 ] ≤ neg(k). For the remaining events Ej,1 |Ej−1,0 (2 ≤ j ≤ l + 1), we have: • Ej−1,0 implies that value αj−1 is not the correctly formed digest of some node vj−2 ∈ N (vj−1 ), as defined in Relation 3.20, namely that αj−1 ∈ / {χ(vt ) : vt ∈ N (vj−1 )} which gives h(αj−1 ) ∈ / {h(χ(vt )) : vt ∈ N (vj−1 )}, by the one-to-one property of h(.); • However, the event Ej,1 implies that (1) digest αj (for j = l + 1 this is just dh ) is the correctly formed digest of node vj−1 ; and (2)  Q   (s+h(χ(vt ))) e βj−1 , g s g h(αj−1 ) = e g vt ∈N (vj−1 ) ,g . where χ(vt ) are the correctly formed digests of the set of neighbors of vj−1 . Since h(αj−1 ) ∈ / {h(χ(vt )) : vt ∈ N (vj−1 )}, by Lemma 3.3 and Assumption 3.2, this probability is neg(k). Therefore for all j = 1, . . . , l + 1, Pr[Ej,1 |Ej−1,0 ] is neg(k). Since l = O(1), the total probability is also neg(k). This concludes the proof for the membership proof. For the case of a non-membership proof, the proof for this case follows exactly the same logic with Lemma 3.11, so it is omitted. 2 We continue with the following corollary that is useful in Chapter 5: Corollary 3.4 Let H(e) = j and Π(e) = {(αi , βi ) : i = 0, . . . , l} be a membership proof for element e. The probability that verify(e, true, Π(e), dh , pk) accepts and β0 6= g negl(k). Q x∈Lj −{e} (s+x) is 88 Proof: The event β0 6= g Q x∈Lj −{e} (s+x) and verify() accepts implies the event α1 6= g Q x∈Lj (s+x) and verify() accepts. Therefore the probability in question is less or equal than the probability Pr[E1,0 ∩ E2 ∩ . . . ∩ El+1,1 ], since E1,0 exactly the event α1 6= g Q x∈Lj (s+x) . By following the same proof procedure as in Relations 3.30, this can be proved to be neg(k) as well. 2 Theorem 3.4 Let k be the security parameter and 0 <  < 1. Then there exists a publiclyverifiable authenticated data structure scheme BHT = {genkey, setup, update, refresh, query, verify} for a data structure scheme defined for a dynamic hash table D storing n elements such that: 1. It is correct according to Definition 2.4 and secure according to Definition 2.5 and under the bilinear q-strong Diffie-Hellman assumption; 2. The access complexity of setup() is O(n), outputting an authenticated data structure auth(D) of O(n) group complexity; 3. The expected amortized access complexity of update() is O(1), outputting update information upd of O(1) amortized group complexity; 4. The expected amortized access complexity of refresh() is O(n ) (or O(1)); 5. The expected access complexity of query() is O(1) (or O(n log n)), outputting a proof Π(q) for a query q of O(1) group complexity; 6. The access complexity of verify() is O(1). Proof: This result follows directly from Lemmata 3.13, 3.14, 3.17, 3.18, 3.19, 3.20 and 3.21. The complexities in the brackets (O(1) for refresh() and O(n log n) for query()) refer to the case when no precomputed witnesses are used. 2 89 Finally we note here that both constructions (RSA accumulator and bilinear-map accumulator) use the same algorithmic ideas, i.e., the accumulation tree. We could have described a scheme by using an abstract notion of an accumulator and then derive our results by instantiating the abstract solution with the RSA accumulator and the bilinear-map accumulator. However, we chose not to do that because we feel that this would add more complexity in the presentation, for something that can be derived and described a lot easier with no abstraction. 3.3.2 Protocols Three-party protocol. By using Theorem 2.1 we can easily derive the following corollary that describes the use of the authenticated data structure scheme BHT of Theorem 3.4 in the three-party model: Corollary 3.5 Let k be the security parameter and assume that the bilinear q-strong DiffieHellman assumption holds. Then there exists a three-party authenticated data structures protocol (see Protocol 2.1) for verifying (non)-membership queries q on a dynamic hash table storing n elements such that: 1. The setup at the source has O(n) access complexity; 2. The update at the source has O(1) expected amortized access complexity; 3. The space needed at the source has O(n) group complexity; 4. The communication between the source and the server has O(1) amortized group complexity; 5. The update at the server has O(n ) (or O(1)) expected amortized access complexity; 6. The query at the server has O(1) (or O(n log n)) expected access complexity; 7. The space needed at the server has O(n) group complexity; 90 8. The communication between the server and the client has O(1) group complexity; 9. The verification at the client has O(1) access complexity; 10. For a query q sent by the client to the server at any time (even after updates), let α be an answer and let π be a proof returned by the server. With probability Ω(1 − neg(k)), the client accepts the answer α if and only if α is correct. Two-party protocol. As a corollary of Lemma 3.12 (the proof follows exactly the same techniques), we can state a similar assumption for the authenticated data structure scheme BHT : Corollary 3.6 Assumption 2.1 is true for the authenticated data structure scheme BHT . Moreover, for every update u, |Qu | has O(1) amortized complexity. By Theorems 2.2 and 3.4 and Corollary 3.6, we can now state the final result for the two-party model: Corollary 3.7 Let k be the security parameter and assume that the bilinear q-strong DiffieHellman assumption holds. Then there exists a two-party authenticated data structures protocol (see Protocol 2.2) for verifying (non)-membership queries q on a dynamic hash table storing n elements such that: 1. When precomputed witnesses are used, the protocol requires one round of interaction during updates that cause the hash table to be rebuilt (see Definition 3.2); When no precomputed witnesses are used, it requires one round of interaction during updates; 2. The setup at the client has O(n) access complexity; 3. The update at the client has O(1) expected amortized access complexity; 4. The verification at the client has O(1) access complexity; 5. The space needed at the client has O(1) group complexity; 91 6. The communication between the client and the server has O(1) amortized group complexity during updates and O(1) group complexity during queries; 7. The update at the server has O(n ) (or O(n log n)) expected amortized access complexity; 8. The query at the server has O(1) (or O(n log n)) expected access complexity; 9. The space needed at the server has O(n) group complexity; 10. For a query q sent by the client to the server at any time (even after updates), let α be an answer and let π be a proof returned by the server. With probability Ω(1 − neg(k)), the client accepts the answer α if and only if α is correct. 3.4 Complexity limitations In this chapter, we proposed a new, provably secure, cryptographic construction for verifying hash table queries over a dynamic set. We use nested cryptographic accumulators on a tree of constant depth to achieve constant query and verification costs and sublinear update costs. Our results are applicable to both the two-party and three-party data authentication models. We use our method to authenticate general set-membership queries and overall improve over previous techniques that use cryptographic accumulators, reducing the main complexity measures to constant, yet keeping sublinear update complexity. An important open problem is whether one can achieve logarithmic update cost and still keep the communication complexity constant. There has been no such solution todate. In particular, no method is known that can construct constant-size accumulator proofs (witnesses) in logarithmic time. In Chapter 6 however, which is the full version of the work of Papamanthou et al. [91], we describe a solution for this problem that uses a cryptographic primitive that, unfortunately, is not known to exist yet. On the other hand, we believe that doing even better, i.e., achieving constant complexity for all the complexity measures seems 92 to be unfeasible due to the Ω(log n/ log log n) memory checking lower bound [35] on query complexity (the sum of read and write complexity). This result, however, motivates seeking more general lower bounds for authenticated data structures (similar directions have been followed in the lower bound works of Dwork et al. [35] and Tamassia and Triandopoulos [106]): given any cryptographic primitive, what is the best we can do in terms of complexity? Finally, it would be interesting to modify our schemes to obtain non-amortized bounds for updates using for example Overmar’s global rebuilding technique [87]. Chapter 4 Authenticated structures based on lattices Lattices, an infinite-sized set of specially constructed vectors, is a mathematical tool that made its first appearance in cryptography with Ajtai’s seminal result [3], showing the construction of one-way functions based on hard lattices problems. Since then, lattices have been proven to enjoy appealing properties that have made their application to cryptography very promising. Such properties include the seemingly resistance of lattice-based assumptions to quantum algorithms [98]—as opposed to other assumptions such as factoring—as well as their worst-case to average-case reductions [99], namely the existence of polynomial-time algorithms that can transform a solution to a random instance of a certain problem into a solution to any (worst-case) instance of another lattice problem. As such, many cryptographic primitives, such as public-key encryption schemes (e.g., see the work of Peikert [95]) and collision-resistant hash functions (e.g., see the work of Lyubashevsky and Micciancio [71]), based on lattice assumptions have been derived during the last decade. Even more significantly, the long-standing open problem of fully-homomorphic encryption was settled with a lattice-based construction in 2009 by Gentry [43]. Finally, lattice-based constructions appear to be efficient in practice due to the extensive use of linear algebra (therefore also easily parallelizable), and have also led to the deployment of lattice-based cryptographic systems (e.g., see the NTRU system by Hoffstein et al. [57]). In this chapter we present the first authenticated data structure based on lattices, and 93 94 specifically a lattice-based authenticated table of highly desirable complexity features, such as update optimality and parallelism (i.e., the constructed authenticated table admits parallel algorithms). Specifically, we design the first authenticated data structure based on lattices, the update complexity of which is O(1), improving in this way the O(log n) update bounds of previous constructions, such as the Merkle tree, and while retaining efficient O(log n) proof complexity. Moreover, the used lattice-based cryptographic primitive lends itself to a natural notion of parallelism: As such, we describe parallel versions of our authenticated data structure algorithms, yielding the first parallel online memory checker [15] with O(1) query complexity using O(log n) checkers in the CREW model (a parallel model of computation where processors can read concurrently but can write only exclusively) and without using a secret key setting, i.e., there is only need for small reliable but not secret memory (as opposed to [54]). We base the security of our constructions on the difficulty of approximating the gap version of the shortest vector problem in lattices (GAPSVP) within polynomial factors. The key idea used here is to combine the simplicity of a Merkle tree [77] with a special property of lattice-based hash functions, which we establish and call repeated linearity. Roughly speaking, this property allows using the output of one invocation of the hash function, as an input to another invocation of the function, without losing “structure”. This observation, in the authenticated data structures setting, turns out to be crucial in achieving constant update complexity (as well as parallel algorithms), while keeping all the remaining complexity bounds logarithmic. This is a trade-off that, to the best of our knowledge, has not been achieved so far in the literature—and is feasible due to the use of lattices: For example, for a table data structure of n entries, the constructions of Bellare and Micciancio [11], the authenticated data structure of Papamanthou et al. [90] and the memory checker of Dwork et al. [35] have O(1) update but Ω(n ) proof (or query) complexity, whereas hierarchical hashing constructions such as the one of Blum et al. [15] and the one of Goodrich and Tamassia [48] impose O(log n) bounds on all the complexity measures, which is to be expected, given the lower bound for hash-based authenticated data structures by Tamassia 95 and Triandopoulos [106]. The data structure we are considering in this chapter is a dynamic table of size n, read and written through indices 0, . . . , n − 1. We base the security of our construction on the hardness of the GAPSVP problem in lattices [78], which has its own significance given recent attacks on collision-resistant functions such as MD-5 [103]. We note that our construction requires an one-time O(n log n) preprocessing, which is is however amortized—in comparison with other works (see Table 4.1)— after Ω(n log n) updates. Overview of the solution. Our authenticated data structure scheme, denoted with LBT in Table 4.1, can be seen as a generalization of the Merkle tree and related hierarchical hashing constructions [15, 48, 81]. By exploiting a property of lattice-based hash functions (which we call repeated linearity) over a typical Merkle tree, we depart from black-box use of generic collision-resistant hash functions (e.g., MD-5 or SHA-256) in the authenticated data structures setting. As a consequence, and in the Merkle tree paradigm, the digest of a tree node v can be expressed as the “sum” of well-defined functions (called partial digests) applied to data stored at the leaves of v’s subtree (Theorem 4.3). Exploiting this property enables constant update complexity as well as deriving parallel algorithms. It may also be of general interest and have other applications. A comparison of our solution with existing work is given in Table 4.1. We now give the formal definition of the underlying data structure scheme, for which the authenticated data structure scheme LBT is designed. [15, 48, 75, 81] [11] [83] setup() n n n update() log n 1 1 refresh() log n 1 n query() log n n 1 verify() log n n 1 proof Π(q) log n n 1 info. upd 1 1 1 assumption Generic CR D. Log B. q-DH [23, 101] [51] n n 1 n n log n n 1 n 1 1 1 1 1 n Strong RSA [90] n 1 1 n 1 1 1 LBT n log n 1 log n log n log n log n 1 GAPSVP Table 4.1: Asymptotic access and group complexities of various authenticated data structure schemes (see Definition 2.3) for a dynamic table of n entries. Parameter 0 <  < 1 is a constant and GAPSVP is the gap shortest vector problem in lattices (Definition 4.1). In all schemes, the authenticated structure has group complexity O(n) and genkey() has O(1) complexity. Note that [90] is the published conference version of Chapter 3. The acronyms of the other assumptions can be found in Table 3.1. All presented schemes in the table are publicly verifiable. 96 97 The data structure scheme. Let T be a dynamic table of n indices, storing values T[1], T[2], . . . , T[n]. The data structure scheme {query(), update(), check()} (Definition 2.2) for a dynamic table T is as follows: 1. T[i] ← query(i, T): Given an index 1 ≤ i ≤ n, return T[i]. Answering this query has O(1) complexity; 2. T0 ← update(i, y, T): Given an index 1 ≤ i ≤ n, set T[i] := y. The complexity for this task is O(1); 3. {accept, reject} ← check(i, y, T): If T[i] 6= y return reject. Else return accept. 4.1 Lattice definitions We start with some basic definitions related to lattices.We use upper case bold letters to denote matrices, e.g., B, lower case bold letters to denote vectors, e.g., b, and lower case italic letters to denote scalars. Finally, for a vector x = [x1 x2 . . . xk ]T (note that T as an exponent in the vector notation denotes the transpose vector), kxk denotes the Euclidean norm of x, i.e., kxk = (x21 + x22 + . . . + x2k ) 4.1.1 1/2 . What is a lattice? Given the security parameter k, a full-rank k-dimensional lattice is defined as the infinitesized set of all vectors produced as the integer combinations ) ( k X xi bi : xi ∈ Z, 1 ≤ i ≤ k , i=1 where B = {b1 , b2 , . . . , bk } is the basis of the lattice and b1 , b2 , . . . , bk are linearly independent, all belonging to Rk . We denote the lattice produced by B (i.e., the set of vectors) with L(B). A well-known difficult problem in lattices is the approximation within a polynomial factor of the shortest vector in a lattice (SVP problem). Namely, given a lattice L(B) 98 produced by a basis B, approximate up to a polynomial factor in k the shortest (in an Euclidean sense) vector in L(B), the length of which we denote with λ(B). A similar problem in lattices is the “gap” version of the shortest vector problem (GAPSVPγ ), the difficulty of which is useful in our context: Definition 4.1 (Problem GAPSVPγ ) An input to GAPSVPγ is a k-dimensional lattice basis B and a number d, where k is the security parameter. In YES inputs λ(B) ≤ d and in NO inputs λ(B) > γ × d, where γ ≥ 1. We note that, for exponential values of γ, i.e., γ = 2O(k) , one can use the LLL algorithm [65] and decide the above problem in polynomial time. The difficult version of the problem arises for polynomial γ, for which no efficient algorithm is known to date, even for factors slightly smaller than exponential [99], i.e., very big polynomials. Moreover, for polynomial factors, there is no proof that this problem is NP-hard1 , which makes the polynomial approximation cryptographically interesting as well. Therefore, a well-accepted assumption on which the security of our scheme is based is as follows: Assumption 4.1 (Hardness of GAPSVPγ ) Let GAPSVPγ be an instance of the gap version of the shortest vector problem in lattices, as defined in Definition 4.1 and k be the security parameter. There is no polynomial-time algorithm for solving GAPSVPγ for γ = poly(k), except with negligible probability neg(k). 4.1.2 Reductions After Ajtai’s seminal work [3] where an one-way function based on hard lattices problem is presented, Goldreich et al. [44] presented a variation of the function, providing at the same time collision resistance. Based on this collision resistant hash function, Micciancio and Regev [78] described a generalized version of it, a modification of which we are using in 1 In specific, as outlined in [99], the current state of knowledge indicates that for γ > unlikely that this problem is NP-hard and no efficient algorithm is known to date. p k/ log k, it is 99 our construction. The security of the hash function is based on the difficulty of the small integer solution problem (SIS): Definition 4.2 (Problem SISq,m,β ) Given an integer q, a matrix M ∈ Zk×m and a real β, q find a non-zero integer vector z ∈ Zm \{0} such that Mz = 0 mod q and kzk ≤ β. √ Note that at least one solution to the above problem exists when β ≥ mq k/m and m > k [78]. √ Moreover, if q ≥ 4 mk 1.5 β, we will see that such a solution is difficult to find. We continue with the definition of SIS0 , where the solution vector is required to have at least one odd coordinate: and a real β, Definition 4.3 (Problem SIS0q,m,β ) Given an integer q, a matrix M ∈ Zk×m q find an integer vector z ∈ Zm \2Zm such that Mz = 0 mod q and kzk ≤ β. For odd q, there is a polynomial-time reduction from SIS0q,m,β to SISq,m,β [78]: Lemma 4.1 (Reduction from SIS0q,m,β to SISq,m,β [78]) For any odd integer q ∈ 2Z + 1 and SIS0 instance I = (q, M, β), if I has a solution as an instance of SIS, then it has a solution as an instance of SIS0 . Moreover, there is a polynomial-time algorithm that on input a solution to a SIS instance I, outputs a solution to the same SIS0 instance I. As proved by Micciancio and Regev [78], by choosing certain parameters, GAPSVPγ can be reduced to SIS0 (derived by combining Lemma 5.22 and Theorem 5.23 from the work of Micciancio and Regev [78]): Lemma 4.2 (Reduction from GAPSVPγ to SIS0q,m,β [78]) Let β, m, q = k O(1) be values √ √ that are polynomially-bounded, with q ≥ 4 mk 1.5 β and γ = 14π kβ. Then there is a probabilistic polynomial-time reduction from solving GAPSVPγ in the worst case to solving SIS0q,m,β on the average with non-negligible probability. A direct application of Lemma 4.1 and Lemma 4.2 gives the following result. 100 Theorem 4.1 Let q = k O(1) be an odd positive integer. For any polynomially-bounded values √ √ β, m = k O(1) , with q ≥ 4 mk 1.5 β and γ = 14π kβ, there is a probabilistic polynomial-time reduction from solving GAPSVPγ in the worst case to solving SISq,m,β on the average with non-negligible probability. Theorem 4.1 states that if there is an algorithm that solves an average (i.e., M ∈ Zk×m q √ 1.5 is chosen uniformly at random) instance of SISq,m,β , for an odd q, q ≥ 4 mk β and γ = √ 14π kβ, then, this algorithm can be used to solve any instance of GAPSVPγ . 4.1.3 Lattice-based hash function √ Let m = 2k log q and β = δ m, where δ is poly(k). Note that log δ = O(log k). We also √ require q ≥ 4 mk 1.5 β = 8k 2.5 δ log q. It is easy to see that given k and δ there is always a q = O(k 2.5 δ log k) to satisfy the above constraints—since δ is poly(k), the bit-size of q is O(log k). The collision resistant hash function that we are using is a generalization of the function presented by Micciancio and Regev [78], where δ = O(1) (in the security parameter) is used instead. In our construction we use bigger values for δ. Namely the value that we use to bound the norm of the solution vector can be up to poly(k). This was observed in the original definition of Ajtai’s one-way function [3], i.e., that the input vector can contain larger values (but not so large), and was also noted in its extension that achieves collision resistance [44]. This remark is very useful in our context and implies that, the larger value one picks for β, the larger the modulus q should be so that security is guaranteed (still q’s bit size is O(log k)). Let now M ∈ Zk×m be a k × m matrix that is chosen uniformly at random. We can q define the function hM : Zm → Zkq as hM (x) = Mx mod q, where kxk ≤ β and the modulo operation is taken component-wise. The above function is collision resistant based on the difficulty of GAPSVP14π√kβ : √ Theorem 4.2 (Strong collision resistance) Let m = 2k log q, β = δ m and q be an 101 √ odd positive integer such that q ≥ 4 mk 1.5 β. Let also M ∈ Zk×m be a k × m matrix that is q chosen uniformly at random. If there is a polynomial-time algorithm that finds two vectors x, y ∈ {0, 1, . . . , δ}m and x 6= y such that Mx = My mod q, then there is a polynomial-time algorithm to solve any instance of GAPSVP14πδ√km . Proof: Suppose there is an algorithm that finds x, y ∈ {0, 1, . . . , δ}m with x 6= y such that Mx = My mod q. Therefore the non-zero vector z = x − y, which also has norm kzk ≤ β, since its coordinates are between −δ and +δ, comprises a solution to the problem SISq,m,β (note that matrix M by construction is chosen uniformly at random). By Theorem 4.1, this √ √ can be used to solve GAPSVPγ for γ = 14π kβ. Setting β = δ m we get the desired result. 2 Since δ = poly(k), γ is also poly(k) and therefore the presented hash function is secure, by Assumption 4.1. We can now extend the function h to accept two inputs as follows: Denote with Tδ,+ the set of all m × 1 (m = 2k log q) vectors such that their last k log q entries are zero and the remaining entries are in {0, 1, . . . , δ} and analogously with Tδ,− the set of all m × 1 vectors such that their first k log q entries are zero and the remaining entries are in {0, 1, . . . , δ}: Definition 4.4 (Lattice-based hash function with two inputs) We define the function hM,δ : Tδ,+ × Tδ,− → Zkq as hM,δ (x, y) = M(x + y) mod q, where x, y ∈ {0, 1, . . . , δ}m . Note that we use both M and δ as subscripts for the function. Similarly as in Theorem 4.2, this function is strong collision resistant, i.e., if there is a polynomial-time algorithm that finds (x1 , y1 ) ∈ (Tδ,+ × Tδ,− ) and (x2 , y2 ) ∈ (Tδ,+ × Tδ,− ) with (x1 , y1 ) 6= (x2 , y2 ) such that M(x1 + y1 ) = M(x2 + y2 ) mod q then there is a polynomial-time algorithm that solves GAPSVPγ for polynomial γ. To see that, note that the vector x1 −x2 +y1 −y2 has coordinates in {0, 1, . . . , δ}, since, by the definition of Tδ,+ and Tδ,− , the entries of x1 − x2 and y1 − y2 do not overlap. 102 Time and space complexity of hash function. In this paragraph we analyze the time and space complexity of the used hash function. Since the modulus q has O(log k) bits, our hash function is described with a k × 2k log q matrix of O(log k)-bit entries. Therefore the space complexity is O(k 2 log2 k) bits. Given now an input x ∈ {0, 1, . . . , δ}2k log q , we can compute hM,δ (x) in O(k 2 log2 k log2 log k) time. To see that, an application of the hash function requires the computation of k internal products between vectors of 2k log q entries, and each multiplication in the internal product is a multiplication in Zq , which can be computed in O(log k log2 log k) time using FFT [29]. This makes the total time equal to O(k 2 log2 k log2 log k). 4.1.4 Parallel models of computation As we mentioned in the beginning of this chapter, we also give parallel versions of our latticebased authenticated data structures algorithms. We use the PRAM model (parallel random access machine) and specifically EREW PRAM, CREW PRAM and CRCW PRAM. We recall the definition of these models below: 1. EREW: This model allows all processors to read and write exclusively at the same time. Therefore no conflicts need to be resolved; 2. CREW: This model allows all processors to read concurrently and write exclusively at the same time. Read conflicts are resolved with O(1) complexity; 3. CRCW: This model allows all processors to read concurrently and write concurrently at the same time. Read and write conflicts are resolved with O(1) complexity. Note that EREW requires minimal assumptions, CREW requires a stronger assumption (as there is a need to resolve read conflicts) and CRCW requires the strongest assumptions since both read and write conflicts need to be resolved. Ways to resolve conflicts in the PRAM model have been extensively studied by the literature. A great introduction to 103 most fundamental results related to the PRAM model of computation as well as to parallel algorithms is given in the book of JaJa [59]. 4.2 Main construction In this section we present our update-optimal authenticated data structure scheme for a dynamic table, i.e., the scheme LBT = {genkey, setup, update, refresh, query, verify}. We recall that the data structure for which we describe an authenticated data structure scheme for is a table T that consists of n indices 1, 2, . . . , n, supporting index queries and index updates. A direct solution for this problem would be to use a Merkle tree with some collisionresistant hash function (e.g., SHA-2, see first column of Table 4.1), which would bear logarithmic complexities in all the complexity measures—also inherently enforcing sequential computations. Here we build an authenticated structure for this data structure that uses the lattice-based hash function introduced in Section 4.1 and also supports constant complexity updates, allowing at the same time a great deal of parallelism. 4.2.1 Algebraic tools We now discuss some algebraic tools to be used in our construction. Without loss of generality, assume that q, the modulus is a power of two: Definition 4.5 (Binary representation) Define f (x) = [f0 f1 . . . flog q−1 ]T ∈ {0, 1}log q P q−1 i to be the binary representation of x ∈ Zq . Namely, x = log fi 2 mod q. i=0 q Definition 4.6 (Radix-2 representation) Define g(x) = [f0 f1 . . . flog q−1 ]T ∈ Zlog to be q P q−1 i some radix-2 representation of x ∈ Zq . Namely, x = log fi 2 mod q. i=0 q By “some” radix-2 representation we mean that the function g : Zq → Zlog is “one-toq many”. For example, for q = 16, x = 7, possible values for g(x) can be [0 1 1 1]T (the usual 104 binary representation), [0 − 2 0 − 1]T or [−2 2 0 − 1]T (and many more). We now give an important result for our construction: Lemma 4.3 For any x1 , x2 , . . . , xt ∈ Zq there exist a radix-2 representation g(.) such that g(x1 + x2 + . . . + xt mod q) = f (x1 ) + f (x2 ) + . . . + f (xt ) mod q. Moreover it is g(x1 + x2 + . . . + xt mod q) ∈ {0, . . . , t}log q . Proof: Let xi = f (xi ) be the binary representation of xi for i = 1, . . . , t. Then " t #T t t t X X X X xi = xi0 xi1 . . . xi(k−1) mod q . i=1 i=1 i=1 i=1 The resulting vector is a radix-2 representation of ! ! t t X X xi0 × 20 + xi1 × 21 + . . . + i=1 i=1 t X ! xi(k−1) × 2k−1 i=1 which can be written as k−1 k−1 k−1 X X X j j x1j × 2 + x2j × 2 + . . . + xtj × 2j = x1 + x2 + . . . + xt j=0 j=0 mod q , mod q. j=0 Therefore there exists a radix-2 representation g such that g(x1 + x2 + . . . + xt mod q) = f (x1 ) + f (x2 ) + . . . + f (xt ) mod q. Finally note that since g(.) is the sum of t binary representations, it cannot contain a entry that is greater than t. 2 Lemma 4.3 is useful in the following sense: Given two binary representations of x1 and x2 , namely f1 and f2 , a radix-2 representation of x1 + x2 is f1 + f2 . Definitions 4.5 and 4.6 and also Lemma 4.3 (see Corollary 4.1) can be naturally extended for vectors x ∈ Zkq : For i = 1, . . . , k, xi is mapped to the respective log q entries f (xi ) (or g(xi )) in the resulting vector f (x) (or g(x)). Therefore we have the following: Corollary 4.1 For any x1 , x2 , . . . , xt ∈ Zkq there exist a radix-2 representation g(.) such that g(x1 + x2 + . . . + xt mod q) = f (x1 ) + f (x2 ) + . . . + f (xt ) mod q. Moreover it is g(x1 + x2 + . . . + xt mod q) ∈ {0, . . . , t}k log q . To constrain the inputs to our hash function, we need the following definition: Definition 4.7 Let x ∈ Zkq . We say that the radix-2 representation g(x) ∈ Zkq log q is δadmissible if and only if g(x) ∈ {0, 1, . . . , δ}k log q . 105 4.2.2 Algorithms of the scheme We now describe the algorithms of the scheme LBT (see Definition 2.3). All expressions below are reduced modulo q, i.e., we work in Zq . Algorithm {sk, pk} ← genkey(1k ): On input the security parameter k, this algorithm computes an odd number q = O(k 2.5 δ log k), for some δ = n = poly(k). Namely we set δ to be uniformly at random, where equal to the size of the table, n. Then it samples M ∈ Zk×m q m = 2k log q. It sets sk = Ø and pk = {M, q}, i.e., there is no secret (trapdoor information) in our scheme. The access complexity of this algorithm is O(1). Lattice-based digests. Before we describe algorithm setup(), we describe how we define the lattice-based digests on the table T, by using the hash function of Definition 4.4. Let D0 be the initial state of our table, storing values x1 , x2 , . . . , xn ∈ Zkq . Let T be the binary tree of ` levels on top of the values x1 , x2 , . . . , xn —recall we have assumed that n = 2` , and r be the root of tree T . By convention, the root of the tree lies at level 0 and the leaves of the tree lie at level `. For every leaf node vi of the tree, i = 1, . . . , n, the digest d(vi ) is defined as d(vi ) = xi . Then, for any internal node u, with left child v and right child w, by using the hash function hM,n (x, y) given in Definition 4.4 in a recursive way, the digest d(u) of node u can be defined as d(u) = hM,n (Ug(d(v)), Dg(d(w))) = M [Ug(d(v)) + Dg(d(w))] , (4.1) where g(d(v)) and g(d(w)) are some n-admissible radix-2 representations of d(v) and d(w), i.e., by Definition 4.4, it must be g(d(v)), g(d(w)) ∈ {0, 1, . . . , n}k log q . In the above relations, matrices U and D are special matrices such that multiplying matrices U and D with a vector in {0, 1, . . . , n}k log q doubles the dimension of the vector by shifting its entries accordingly and by filling the vacant entries with zeros. This operation is used to prepare the vectors in the appropriate input format for the hash function. More formally, U = [Ik log q Ok log q ]T and D = [Ok log q Ik log q ]T , where Il denotes the square identity 106 matrix of dimension l and Ol denotes the square zero matrix of dimension l. Indeed, it easy to see that for all x ∈ {0, 1, . . . , n}k log q it is Ux ∈ Tn,+ and Dx ∈ Tn,− , where Tn,+ and Tn,+ are defined in Section 4.1. M× r + Ug(d(r11)) Ug(d(r21)) M× r11 + Dg(d(r22)) Ug(d(r23)) Dg(x2) Ug(x3) M× r12 + Dg(d(r24)) M× r24 + M× r23 + M× r22 + M× r21 + Ug(x1) Dg(d(r12)) Dg(x4) Ug(x5) Dg(x6) Ug(x7) Dg(x8) x1 x2 x3 x4 x5 x6 x7 x8 MUf(MUf(MUf(x1))) MUf(MUf(MDf(x2))) MUf(MDf(MUf(x3))) MUf(MDf(MDf(x4))) MDf(MUf(MUf(x5))) MDf(MUf(MDf(x6))) MDf(MDf(MUf(x7))) MDf(MDf(MDf(x8))) Figure 4.1: Tree T built on top of a table with 8 values x1 , x2 , . . . , x8 . After producing an n-admissible radix-2 g(.) representation of the children digests, we multiply with either U or D, then we add the two resulting digests and we compute the hash function on them by multiplying with M. At the leaves of the tree we show the terms that correspond to each index, as computed by Theorem 4.3 (i.e., the partial digests of the root r with reference to every value at the table). The g(.) representation of the internal nodes are indicated with dashed lines (see Definition 4.9). Note that the g(.) representations of the internal nodes are the sum of specific f (.) representations of the leaves, for example, g(d(r12 )) = f (Lf (Lf (x5 ))) + f (Lf (Rf (x6 ))) + f (Rf (Lf (x7 ))) + f (Rf (Rf (x8 ))), where MU = L and MD = R. The computation in Relation 4.1 is as follows (see Figure 4.1): Suppose a node u ∈ T has children v and w of digests d(v), d(w) ∈ Zkq . Applying g(.) transforms d(v), d(w) into vectors of k log q small entries (admissible radix-2 representations). Multiplying with U and D prepares g(d(v)), g(d(w)) to be input to the hash function.2 2 The procedure so far is the same with a Merkle tree construction that uses a collision-resistant function such as SHA-2, i.e., recursive computation over the nodes of a tree. 107 4.2.3 Partial digests Here we show how to express the digest d(u) (computed in Relation 4.1) for every node u ∈ T somehow differently, which is crucial for deriving our final results. To simplify some log q . notation, we set MU = L and MD = R (stand for left/right)—note that L, R ∈ Zk×k q Let also range(u) be the range of successive indices corresponding to the leaves of the subtree of T rooted on u. E.g., in Figure 4.1, it is range(r11 ) = {1, 2, 3, 4}. For every node u ∈ T and for every i ∈ range(u) we define the partial digest of u with reference to xi : Definition 4.8 (Partial digest of a node u) For a leaf node u ∈ T storing value xi , the partial digest of u with reference to xi is defined as d(u, xi ) = xi . Else, for every other node u of T , with left child v and right child w, and for every i ∈ range(u), the partial digest d(u, xi ) of u with reference to xi is recursively defined as d(u, xi ) = Lf (d(v, xi )), if xi belongs to the left subtree of u; Else, d(u, xi ) = Rf (d(w, xi )). E.g., in Figure 4.1, the partial digests of root r with reference to x2 and x3 are d(r, x2 ) = Rf (Rf (Lf (x2 ))) and d(r, x3 ) = Rf (Lf (Rf (x3 ))) respectively (f (z) is z’s binary representation). We now give the main result of this section. Theorem 4.3 The digest d(u) of node u ∈ T in Relation 4.1 can be expressed as d(u) = X d(u, xi ) , i∈range(u) where d(u, xi ) is the partial digest of node u with reference to xi . Proof: We prove the claim by induction on the levels of the tree T . For any internal node u that lies at level ` − 1, there are only two nodes (that store for example values xi (left child) and xj (right child) and belong to range(u)) in the subtree rooted on u. Therefore d(u, xi ) + d(u, xj ) = Lf (xi ) + Rf (xj ) = MUf (xi ) + MDf (xj ) = M [Ug(xi ) + Dg(xj )] = d(u) . 108 This is due to Relation 4.1 and also due to the fact that g(.) can be picked to be f (.), which is an n-admissible radix-2 representation, therefore satisfying the constraint of the inputs of Definition 4.4. Hence the base case holds. Assume the theorem holds for any internal node z that lies at level 0 < t + 1 ≤ `. Therefore d(z) = X d(z, xi ) . i∈range(z) Let u be an internal node that lies at level t and let i1 , i2 , . . . , iu be the indices in range(u) in sorted order. Let v be the left child of u and w be the right child of u. Then, by the definition of the partial digest of the node u (Definition 4.8) we d(u) = X d(u, xi ) = u/2 X Lf (d(v, xj )) + j=1 i∈range(u) u/2 = MU X f (d(v, xj )) + MD j=1 u X Rf (d(w, xj )) j=u/2+1 u X f (d(w, xj )) . j=u/2+1 By Corollary 4.1 there exist g(.) representations whose entries are at most u/2 ≤ n such that     u/2 u X X d(u) = MUg  d(w, xj ) . d(v, xj ) + MDg  j=1 j=u/2+1 By the inductive step this can be written as d(u) = M[Ug(d(v)) + Dg(d(w))] , where g(.) are radix-2 representations that are n-admissible, since they are the sum of at most u/2 = n/2 binary representations. Therefore this satisfies Definition 4.1 and d(u) is indeed the correct digest of any internal node u, as computed by Relation 4.1. This completes the proof. 2 Algorithm {auth(D0 ), d0 } ← setup(D0 , sk, pk): Let D0 be the initial table, storing values x1 , x2 , . . . , xn ∈ Zkq . The algorithm computes the digests of the nodes: It sets d(u) = xi for all leaf nodes u storing value xi and d(u) = M [Ug(d(v)) + Dg(d(w))] (application of 109 the hash function in Definition 4.4) for all internal nodes u with left child v and right child w, where g(d(v)) and g(d(w)), i.e., the radix-2 representations of the children digests, are computed according to the following definition3 : Definition 4.9 The radix-2 representation of d(u) of node u ∈ T is computed as the sum of |range(u)| binary representations, i.e., g(d(u)) = X f (d(u, xi )) , i∈range(u) where d(u, xi ) is the partial digest of node u with reference to xi . By combining Theorem 4.3 and Definition 4.9, by Corollary 4.1, we have: Corollary 4.2 Let u be an internal node of tree T . The g(.) representation of d(u) defined in Definition 4.9 is an n-admissible radix-2 representation of d(u). This concludes the description of setup(). The algorithm outputs d0 = d(r), where r is the root of T (i.e., the digest of the data structure is the digest of the root of the tree) and also it outputs auth(D0 ) to be a structure that contains: (a) Tree T ; (b) g(d(u)) for all nodes u of T as computed in Definition 4.9. The complexity of the algorithm is O(n log n), since the computation of g(d(u)) involves a linear number of operations per tree level, and there are O(log n) levels in total: Lemma 4.4 Algorithm setup() of the authenticated data structure scheme LBT has O(n log n) access complexity, outputting an authenticated data structure auth(D0 ) of O(n) group complexity. Moreover it is parallelizable with O(n) access complexity using O(log n) processors in the CREW model. Proof: The algorithm needs to compute the n-admissible radix-2 representations g(d(u)) of digests d(u) for every internal node u of the tree T . Note that by Definition 4.9, there 3 Note here that the binary representations f (d(v)), f (d(w)) could be used instead; However, in lieu of achieving our efficiency goals, the algorithm uses Definition 4.9. 110 are n/2, n/4, n/8, . . . , 2 such representations that need to be computed for levels ` − 1, ` − 2, ` − 3, . . . , 1 respectively, each one being the sum of 2, 4, 8 . . . , n/2 binary representations respectively, i.e., X g(d(u)) = f (d(u, xi )) . i∈range(u) Since computing f (d(u, xi )) has access complexity O(1) (they are just functions of specific values), it follows that the computation of the g(.) representations for all the internal nodes of the tree requires access complexity n n n n × 2 + × 4 + × 8 + . . . + 2 × = O(n log n) . 2 4 8 2 Note now in the CREW model, we can use O(log n) processors, i.e., one processor for each level of the tree. By reading the values xi concurrently and writing the values g(d(u)) at different memory locations, it follows that each processor will have to do O(n) work in the CREW model. Finally, we note that the output authenticated data structure stores with each internal node u of the tree T the respective n-admissible radix-2 representations g(d(u)). Therefore the group complexity of auth(D) is O(n). This completes the proof. 2 We continue by noting that Theorem 4.3 allows us to express d(r) as a sum of well-defined functions of the leaves, namely the partial digests of the root r with reference to values in the table. This allows us to achieve our desired complexity bounds: Corollary 4.3 Let x1 , x2 , . . . , xn be the values stored in our table. Then the digest d(r) of the root r of the tree T can be expressed as d(r) = n X d(r, xi ) , i=1 where d(r, xi ) is the partial digest of the root r with reference to xi . We observe that computing the partial digest d(r, xi ) requires one query to the authenticated data structure, i.e., a query for value xi , therefore yielding O(1) access complexity. Matrices L and R, both used for its computation (Definition 4.8) are not part of the authenticated 111 data structure (they are fixed by setup() as public information) and accessing them any number of times does not add to the access complexity. We continue with describing the remaining algorithms of our authenticated data structure scheme: Algorithm {dh+1 , Dh+1 , upd} ← update(u, Dh , dh , sk, pk): Let the update u be set T[i] = x0i and let the value of T[i] before the update be xi . Then the algorithm sets dh+1 = dh − d(r, xi ) + d(r, x0i ) , where d(r, xi ) and d(r, x0i ) are the partial digests of r with reference to xi and x0i , defined in Definition 4.8. Due to Corollary 4.3, dh+1 is the correct updated digest. Since the computation of partial digests has constant access complexity, algorithm update() has O(1) access complexity, since it involves two operations in Zkq . The algorithm outputs dh+1 as well as the updated table Dh+1 (note that the algorithm does not need to access the authenticated data structure at all—see Definition 2.3—and does not output anything as upd): Lemma 4.5 Algorithm update() of the authenticated data structure scheme LBT has O(1) access complexity. Moreover, the update information upd output by update() is empty. Algorithm {Dh+1 , auth(Dh+1 ), dh+1 } ← refresh(u, Dh , auth(Dh ), dh , upd, pk): This algorithm should update the authenticated data structure auth(Dh ). Let the update u be set T[i] = x0i and let the value of T[i] before the update be xi . Suppose v` , v`−1 , . . . , v1 is the path from the node of index i to the child v1 of the root of the tree. The algorithm should update the values g(d(vj )) for j = `, ` − 1, . . . , 1. This is achieved via Definition 4.9, by setting g(d0 (vj )) = g(d(vj )) − f (d(vj , xi )) + f (d(vj , x0i )) for j = `, ` − 1, . . . , 1 , namely the invariant of Definition 4.9 must be maintained, and where d(vj , xi ), d(vj , x0i ) are the partial digests of node vj with reference to xi and x0i . The algorithm outputs Dh+1 , the updated g(d0 (.)) representations as auth(Dh+1 ) and dh+1 as in update(), i.e., dh+1 = dh − d(r, xi ) + d(r, x0i ). 112 Lemma 4.6 Algorithm refresh() of the authenticated data structure scheme LBT has O(log n) access complexity. Moreover, it is parallelizable with O(1) access complexity using O(log n) processors in the CREW model. Proof: For each update from xi to x0i , the algorithm should update the values g(d(vj )) for j = `, ` − 1, . . . , 1. This is achieved via Definition 4.9, by setting g(d0 (vj )) = g(d(vj )) − f (d(vj , xi )) + f (d(vj , x0i )) , (4.2) (the invariant of Definition 4.9 must be maintained) for j = `, `−1, . . . , 1 and where d(vj , xi ), d(vj , x0i ) are the partial digests of node vj with reference to xi and x0i respectively. Since ` = O(log n) the result follows. Note also that the update Relations 4.2 are independent from one another. Therefore we can take advantage of that, and, in the CREW model, we can use O(log n) processors, i.e., one processor for each level of the tree. By reading the values xi concurrently and writing the values g(d(u)) at different memory locations (as required), it follows that each processor will have to do O(1) work in the CREW model. 2 Algorithm {Π(q), α(q)} ← query(q, Dh , auth(Dh ), pk): Let the query q be return the value stored at index i of table T. Suppose v` , v`−1 , . . . , v1 is the path from the node of index i to the child v1 of the root of the tree T . The algorithm sets α(q) = T[i] and sets the proof Π(q) to be the array π of g(.) representations such that πi = (g(d(vi )), g(d(sib(vi )))) , (4.3) for i = `, ` − 1, . . . , 1, where sib(u) denotes the sibling of a node u in tree T . Lemma 4.7 Algorithm query() of the authenticated data structure scheme LBT has O(log n) access complexity. Moreover, it is parallelizable with O(1) access complexity using O(log n) processors in the EREW model. Finally, it outputs a proof Π(q) of O(log n) group complexity. Proof: Since ` = O(log n) values have to be collected to construct the proof, the result follows. Moreover, with O(log n) processors—one processor per node, this algorithm is parallelizable in the EREW model, with O(1) complexity: For p = 1, . . . , `, processor p outputs 113 πp = (g(d(vp )), g(d(sib(vp )))), as defined in Relation 4.3. 2 Algorithm {accept, reject} ← verify(q, α, Π, dh , pk): Let the query q be return the value at index i, y = α, and Π = π such that πj = (αj , βj ) (j = `, ` − 1, . . . , 1). For j = `, ` − 1, . . . , 1 the algorithm performs the following: 1. If αj is not a g(.) representation of y or αj , βj are not n-admissible g(.) representations, output reject; 2. Set y = M(Uαj + Dβj ) if vj is vj−1 ’s left child, or y = M(Dαj + Uβj ) otherwise. After the loop terminates, if y 6= dh , reject is output, else, accept is output. Lemma 4.8 Algorithm verify() of the authenticated data structure scheme LBT has O(log n) access complexity. Moreover, it is parallelizable with O(1) access complexity using O(log n) processors in the CRCW model. Proof: Since ` = O(log n) values have to be processed to perform the verification of the proof, the result follows. The parallel algorithm that a processor p = `, ` − 1, . . . , 1 executes is the following (assume that α0 is defined as a g(.) representation of the digest dh ): If p < ` then y = M(Uαp + Dβp ) (or y = M(Dαp + Uβp )) else y = α; If αp−1 is not a g(.) representation of y or αp−1 , αp , βp are not n-admissible then output reject else output accept; Note that the algorithm requires concurrent write, since all the processors need to write to a location storing the “reject” bit concurrently. Therefore the algorithm is parallelizable in the CRCW model with O(1) access complexity using O(log n) processors. 2 4.2.4 Correctness and security Lemma 4.9 The authenticated data structure scheme LBT = {genkey, setup, update, refresh, query, verify} is correct according to Definition 2.4. 114 Proof: Let T = D0 be any table of n entries. Fix the security parameter k and output sk and pk = (M, q) by calling algorithm genkey(). Then output an authenticated data structure auth(D0 ) and the respective digest d0 , by calling algorithm setup(). Pick a polynomial number of updates—namely, pick a polynomial number of pairs of indices and values to be written on the respective indices—and update auth(D0 ) and d0 by calling algorithm refresh(). Let Dh be the final table T, auth(Dh ) be the produced authenticated data structure and dh be the final digest. Let i be an index and let y = T[i]. Output a proof Π(q) for index i and answer y by calling query(). Π(q) contains pairs (g(d(vj )), g(d(sib(vj )))) (j = `, ` − 1, . . . , 1) of n-admissible representations, where v` , v`−1 , . . . , v1 are the nodes on the path from index i (i.e., node v` ) to the first child v1 of the root of the root of the tree T . For the elements of the proof, the following are true: 1. g(d(v` )) = f (y) (definition of a leaf digest); 2. d(vj−1 ) = M(Ug(d(vj ))+Dg(d(sib(vj )))) or d(vj−1 ) = M(Dg(d(vj ))+Ug(d(sib(vj ))))— according to left child or right child relation—, for j = `, ` − 1, . . . , 1 and where v0 is the root of the tree (by Relation 4.1); 3. The g(.) representations in Π(q) are always n-admissible, i.e., they are maintained to be n-admissible during updates, since refresh() always updates the g(.) representations so that Definition 4.9 is satisfied, which by Corollary 4.2 gives n-admissible representations. Based on the above, and the code of verify(), we conclude that verify() always accepts a proof for index i (of answer y = T[i]) computed by query(). 2 Lemma 4.10 The authenticated data structure scheme LBT = {genkey, setup, update, refresh, query, verify} is secure according to Definition 2.5 and assuming the hardness of √ GAPSVPγ for γ = O(nk log n + log k). Proof: Fix the security parameter k and output sk and pk = (M, q) by calling algorithm genkey(). Let Adv be a polynomially-bounded adversary. Adv picks an initial table T = D0 115 of n entries and outputs authenticated data structure auth(D0 ), the respective digest d0 , tree T of ` levels, by calling algorithm setup() through oracle access. Then Adv picks a polynomial number of updates—namely, he picks a polynomial number of pairs of indices and values to be written on the respective indices: Let Dh be the final table T, and dh be the final digest as produced by the adversary through oracle access to algorithm update(). Let q = i be the query index picked by Adv, y = T[i] be the value stored in this index and vl , vl−1 , . . . , v0 be the path of T from the node referring to index i to the root of T . The adversary Adv outputs an incorrect answer α 6= y and also a proof Π = (πl , πl−1 , . . . , π1 ) (l = O(log n)) where πj = (αj , βj ) (see algorithm query()). We define now the following events, related to the choice of the proof above made by the adversary. Our goal will be to the express the probability that verify(i, α, Π, dh , pk) accepts while α 6= y as a function of the following events. Note that dh is the correct digest of the authenticated data structure: 1. E`,0 : The value αl picked by Adv is such that α` is not an n-admissible g(.) representation of y; 2. Ej : For j = ` − 1, . . . , 1, the values αj and αj+1 , βj+1 ∈ {0, 1, . . . , n}k log q picked by Adv are such that αj is an n-admissible g(.) representation of M(Uαj+1 + Dβj+1 ) . Assume, without loss of generality that a convenient index i = 0 is used so that the order of U and D is always the same. This event can be partitioned into two mutually exclusive events, i.e., Ej = Ej,0 ∪ Ej,1 such that • Ej,0 : Value αj is not an n-admissible g(.) representation of the digest of node vj , as defined in Relation 4.1, i.e., αj 6= g(d(vj )) ; • Ej,1 : Value αj is an n-admissible g(.) representation of the digest of node vj , as defined in Relation 4.1, i.e., αj = g(d(vj )). 116 3. E0,1 : The values α1 ∈ {0, 1, . . . , n}k log q and β1 ∈ {0, 1, . . . , n}k log q picked by Adv are such that dh = M(Uα1 + Dβ1 ). The probability that verify() accepts, while α` is not an n-admissible g(.) representation of y is the probability Pr[E`,0 ∩ E`−1 ∩ E`−2 ∩ . . . ∩ E0,1 ] = Pr[E`,0 ∩ (E`−1,0 ∪ E`−1,1 ) ∩ (E2,0 ∪ E2,1 ) ∩ . . . ∩ E0,1 ] ≤ Pr[E`,0 |E`−1,1 ] + Pr[E`−1,0 |E`−2,1 ] + Pr[E`−2,0 |E`−3,1 ] + . . . + Pr[E1,0 |E0,1 ] ` X = Pr[Ej,0 |Ej−1,1 ] . j=1 Note that the event Ej,0 |Ej−1,1 implies the following: 1. αj 6= g(d(vj )); 2. αj−1 = g(d(vj−1 )), where d(vj−1 ) = M(Uαj + Dβj ). However, from Relation 4.1, it should be that d(vj−1 ) = M(Ug(d(vj )) + Dg(d(sib(vj )))) , where g(d(vj )) and g(d(sib(vj ))) are the digests of nodes vj and sib(vj ) respectively. Therefore (αj , βj ) is a collision with (g(d(vj )), g(d(sib(vj )))), since αj 6= g(d(vj )). Note now that by √ Theorem 4.2, which gives γ = O(nk log n + log k) = poly(k) since q = O(k 2.5 δ log k) and δ = n, and Assumption 4.1, Pr[Ej,0 |Ej−1,1 ] is neg(k), for all j = `, ` − 1, . . . , 1. Therefore the sum ` X Pr[Ej,0 |Ej−1,1 ] j=1 is also neg(k), since ` = O(log n) = O(log k). This concludes the proof. 2 We can now present the main result of this section. 117 Theorem 4.4 Let k be the security parameter. Then there exists a publicly-verifiable authenticated data structure scheme LBT = {genkey, setup, update, refresh, query, verify} for a data structure scheme defined for a dynamic table D of n entries such that: 1. It is correct according to Definition 2.4 and secure according to Definition 2.5 and √ assuming the hardness of GAPSVPγ for γ = O(nk log n + log k); 2. The access complexity of setup() is O(n log n) or O(n) using O(log n) processors in the CREW model, outputting an authenticated data structure auth(D) of O(n) group complexity; 3. The access complexity of update() is O(1), outputting update information upd of O(1) group complexity; 4. The access complexity of refresh() is O(log n) or O(1) using O(log n) processors in the CREW model; 5. The access complexity of query() is O(log n) or O(1) using O(log n) processors in the EREW model, outputting a proof Π(q) for a query q of O(log n) group complexity; 6. The access complexity of verify() is O(log n) or O(1) using O(log n) processors in the CRCW model. Proof: This result follows directly from Lemmata 4.4, 4.5, 4.6, 4.7, 4.8, 4.9 and 4.10. Note that the presented scheme is publicly verifiable since verify() does not take the secret key as an input. 2 4.2.5 A note on repeated linearity We note here that the fact that the used collision-resistant hash function is additive (homomorphic), i.e., it is Mx + My = M(x + y) , 118 is not enough for deriving our results. This is the reason that other homomorphic collisionresistant hash functions (e.g., exponentiation with secret factorization) could not be employed instead. The crucial property we can exhibit here, which is what we call repeated linearity, is a means of “feeding” the output of the function again as an input, so that certain homomorphic properties are still satisfied—and in specific the properties of Corollary 4.1. Therefore, it might be the case that other functions could be also used instead, would they satisfy such a property. 4.3 Authenticated bloom filters In this paragraph we show how we can use the lattice-based hash function to verify the Bloom filter functionality, a space-efficient dictionary, originally introduced by Bloom [14]. The Bloom filter consists of an array (table) A[0 . . . n − 1] storing n bits. All the bits are initially set to 0. Suppose one needs to store a set S of r elements. Then K hash functions hi (.) with range {0, . . . , n − 1} are used (these are not lattice-based hash functions) and for each element s ∈ S we set the bits A[hi (s)] to 1, for i = 1, . . . , K. In this way, false positives can occur, i.e., an element that is not present might be represented in A. The probability of a false positive can be proved to be (1 − p)K , where p = e−Kr/n , which is minimized for K = ln 2(n/r) [14]. The Bloom filter above supports only insertions though. A deletion (i.e., setting some bits to 0) can cause the undesired deletion of many elements. To deal with this problem, counting Bloom filters were introduced by Fan et al. [38]. In this solution, by keeping a counter for each index of A (instead of just 0 or 1), we can tolerate deletions by incrementing the counter during insertions and decrementing the counter during deletions. However, the problem of overflow exists. As observed by Broder and Mitzenmacher [21], the overflow (at least one counter goes over some value C) occurs with probability n(e ln 2/C)C , for a certain set of r elements. Setting C = O(1) (e.g., C = 16) is suitable for most of the applications [21]. 119 By the above description, it is clear that we can use our lattice-based construction to authenticate the Bloom filter functionality: Each update in the Bloom filter corresponds to K updates in table T and querying one element in the Bloom filter corresponds to K queries to table T . Note that constant update complexity in this application is very important given that a Bloom filter is an update-intensive data structure (i.e., an insertion or deletion of an element involves K operations): Theorem 4.5 Let k be the security parameter. Then there exists a publicly-verifiable authenticated data structure scheme ABF = {genkey, setup, update, refresh, query, verify} for a data structure scheme defined for a Bloom filter D of n entries, storing r elements and using K hash functions such that: 1. It is correct according to Definition 2.4 and secure according to Definition 2.5 and √ assuming the hardness of GAPSVPγ for γ = O(nk log n + log k); 2. The access complexity of setup() is O(n log n) or O(n) using O(log n) processors in the CREW model, outputting an authenticated data structure auth(D) of O(n) group complexity; 3. The access complexity of update() is O(K), outputting update information upd of O(1) group complexity; 4. The access complexity of refresh() is O(K log n) or O(K) using O(log n) processors in the CREW model; 5. The access complexity of query() is O(K log n) or O(K) using O(log n) processors in the EREW model, outputting a proof Π(q) for a query q of O(K log n) group complexity; 6. The access complexity of verify() is O(K log n) or O(K) using O(log n) processors in the CRCW model. 120 Proof: The construction for an authenticated Bloom filter is the same with Theorem 4.4. The extra K multiplicative factor in the complexities is due to the fact that one operation in the authenticated Bloom filter (insertion, deletion and query of an element) requires O(K) operations on an authenticated table. This follows by the construction and the definition of the Bloom filter data structure. 2 4.4 Parallel online memory checking In this section, we establish our results concerning parallel online memory checking. The online memory checking model [15] can be (informally) described as follows: Suppose M is an unreliable (malicious) memory of n cells. A user U wants to read (through operation read(i)) or write (through operation write(i, x), where x is the new content) a cell i ∈ {1, 2, . . . , n}. However, his requests go through a checker C. The checker is supposed to read cells from the unreliable memory C and also some reliable (and possibly secret) information s of sublinear size and output either the correct answer (i.e., the latest content of cell i) or BUGGY, if the content of cell i is corrupted. The probability of returning the corrupted content of a cell as correct should be negligible. The checker is called non-adaptive, if, given an index i, the set and the order of the cells accessed in order to output the answer is deterministic. In this paper we are considering such checkers. In the following, we give the formal definition: Definition 1 (Online memory checking [35]) Let M be an n-cell unreliable memory. An online non-adaptive memory checker C = (Σ, n, q, s) over an alphabet Σ with reliable (and possibly secret) memory s is a probabilistic Turing machine with five tapes: • A read-only input tape for receiving read/write requests from the user U to the unreliable memory M of n cells, indexed by 1, 2, . . . , n; • A write-only output tape for sending responses back to the user; • A read-write work tape, i.e., the (secret) reliable memory s; 121 • A write-only tape for sending read/write requests to the memory M; • A read only input tape for receiving M’s responses. A checker is presented with write(i, x) and read(i) requests made by U to M, where i ∈ {1, 2, . . . , n}. After each read request C returns an answer or outputs that M’s operation is BUGGY. C’s operation should be both correct and secure: 1. Correctness: For any polynomially-large sequence of user requests, as long as M answers all of C’s read requests correctly, C also answers all of the user’s read requests correctly; 2. Security: For any any polynomially-large sequence of user requests, for any (even incorrect or malicious) answers returned by M, the probability that C answers a user request incorrectly is neg(k), where k is the security parameter. C may either recover the correct answer independently or answer that M is BUGGY, but it may not answer a request incorrectly (beyond negligible probability). In online memory checking settings, the complexity measure we are interested in minimizing is the query complexity, which is defined as the sum of the number of requests that the checker makes to the unreliable memory M during a read(i) operation plus the number of requests that the checker makes to the unreliable memory M during a write(i, x) operation [82]. So far in the literature, and in the computational model, checkers with O(log n) [15] or O(logd n) [35] query complexity have appeared. Specifically for these checkers, we can distinguish two cases: 1. In the secret key setting, i.e., when there is requirement for both reliable and secret small memory s, these checkers have been shown to be parallelizable, e.g., see the work of Hall and Julta [54], as well as the construction based on PRFs [15]—although this has not been reported in the literature4 ; 4 The construction based on PRFs appearing [15] is easily parallelizable since the PRF tag computed on each node of the tree is not a function of the PRF tags of its children. 122 2. In the non-secret key setting, i.e., when there is requirement for only reliable memory (e.g., the construction using UOWHFs from [15] and Merkle tree constructions), these checkers have appeared to be inherently sequential. However, in this section we establish the first parallel online memory checker in the non-secret key setting: Theorem 4.6 In the non-secret key setting and in the CREW model of parallel computation, there is a non-adaptive online memory checker for an unreliable memory of n cells with O(1) query complexity, using O(log n) checkers and O(1) reliable memory. Proof: Let LBT = {genkey, setup, update, refresh, query, verify} be the authenticated data structure scheme derived in Theorem 4.4. We show how to construct a parallel online memory checker by using this scheme, in the non-secret key setting. Let M be the unreliable memory accessed through indices 1, 2, . . . , n. Assume we can use u checkers C1 , C2 , . . . , Cu , where u = O(log n). The user U sends his requests to all the checkers simultaneously and all the checkers have access to the unreliable memory M and to some reliable memory s. We work in the CREW model—i.e., all the checkers can read simultaneously the same value but writing at the same location simultaneously is not feasible. Let {sk, pk} ← genkey(), where sk = Ø. The checkers run the algorithm {auth(M), d0 } ← setup(M, pk) (since sk = Ø we do not use the secret key as input from now on) in parallel, requiring O(n) access complexity in the CREW model (Theorem 4.4). The authenticated structure auth(M) is stored in the unreliable memory (all its parts can be uniquely referenced) and d0 is stored in the small reliable memory, i.e., s = d0 . We have two cases: 1. User U sends the request read(i) to all checkers C1 , C2 , . . . , Cu . The checkers run the algorithm query(i, M, auth(M), pk) in parallel and output the answer M[i] and the proof Π(i). This requires O(1) requests to the unreliable memory per checker in the EREW model (Theorem 4.4). Then the algorithm verify(i, M[i], Π(i), s, pk) is run by the checkers (note that running query() and verify() can be combined in one algorithm). 123 The algorithm writes either M[i] (in this case verify() accepts) or BUGGY (in this case verify() rejects) in a location of the reliable memory. User U reads that location and gets the result. We note here that the fact that verify() is parallelizable in the CRCW model does not affect our complexity results since the write part of the algorithm is done on the reliable memory—however, requests to the reliable memory are not taken into account in query complexity (only requests to the unreliable memory). Therefore the query complexity of the parallel checker due to read operations is O(1) in EREW model; 2. User U sends the request write(i, x) to all checkers C1 , C2 , . . . , Cu . First the current content of cell i is verified through a read(i) operation. If this verification succeeds the checkers run algorithm {M0 , auth(M0 ), s0 } ← refresh(write(i, x), M, auth(M), s, pk) in parallel. Note that this algorithm has O(1) access complexity using O(log n) processors in the CREW model, by Theorem 4.4. We need concurrent read because all the checkers should be able to read the same value of the old (verified) content of cell i. Finally, we note that the correctness and the security of the checker comes as a direct result of the correctness and the security of the authenticated data structure scheme LBT . Also, since our lattice-based construction does not use any secret key, it follows that the construction we have described is in the non-secret key setting. This completes the proof. 2 4.5 Protocols Three-party protocol. By using Theorem 2.1 we can easily derive the following corollary that describes the use of the authenticated data structure scheme LBT of Theorem 4.4 in the three-party model: 124 Corollary 4.4 Let k be the security parameter and assume the hardness of GAPSVPγ for γ = poly(k). Then there exists a three-party authenticated data structures protocol (see Protocol 2.1) for verifying queries q on a dynamic table of n entries such that: 1. The setup at the source has O(n log n) access complexity or O(n) access complexity using O(log n) processors in the CREW model; 2. The update at the source has O(1) access complexity; 3. The space needed at the source has O(n) group complexity; 4. The communication between the source and the server has O(1) group complexity; 5. The update at the server has O(log n) access complexity or O(1) access complexity using O(log n) processors in the CREW model; 6. The query at the server has O(log n) access complexity or O(1) access complexity using O(log n) processors in the EREW model; 7. The space needed at the server has O(n) group complexity; 8. The communication between the server and the client has O(log n) group complexity; 9. The verification at the client has O(log n) access complexity or O(1) access complexity using O(log n) processors in the CRCW model; 10. For a query q sent by the client to the server at any time (even after updates), let α be an answer and let π be a proof returned by the server. With probability Ω(1 − neg(k)), the client accepts the answer α if and only if α is correct. Two-party protocol. As a corollary of Example 2.1 (the Merkle tree techniques apply in the lattice-based authenticated table as are), we can state the following for the authenticated data structure scheme LBT : 125 Corollary 4.5 Assumption 2.1 is true for the authenticated data structure scheme LBT . Moreover, for every update u, |Qu | has O(1) complexity. By Theorems 2.2 and 4.4 and Corollary 4.5, we can now state the final result for the two-party model: Corollary 4.6 Let k be the security parameter and assume the hardness of GAPSVPγ for γ = poly(k). Then there exists a two-party authenticated data structures protocol (see Protocol 2.2) for verifying queries q on a dynamic table of n entries such that: 1. The protocol is non-interactive; 2. The setup at the client has O(n log n) access complexity or O(n) access complexity using O(log n) processors in the CREW model; 3. The update at the client has O(log n) access complexity or O(1) access complexity using O(log n) processors in the CRCW model; 4. The verification at the client has O(log n) access complexity or O(1) access complexity using O(log n) processors in the CRCW model; 5. The space needed at the client has O(1) group complexity; 6. The communication between the client and the server has O(log n) group complexity; 7. The update at the server has O(log n) access complexity or O(1) access complexity using O(log n) processors in the CREW model; 8. The query at the server has O(log n) access complexity or O(1) access complexity using O(log n) processors in the EREW model; 9. The space needed at the server has O(n) group complexity; 126 10. For a query q sent by the client to the server at any time (even after updates), let α be an answer and let π be a proof returned by the server. With probability Ω(1 − neg(k)), the client accepts the answer α if and only if α is correct. Finally, we note that similar protocols can be derived for the authenticated data structure scheme ABF (Theorem 4.5), referring to Bloom filters. Chapter 5 Authenticated sets operations with bilinear maps In the previous chapters of this thesis, we mainly studied the verification of fundamental data structure queries, such as hash table queries (Chapter 3) and index queries on tables (Chapter 4). The verification of these queries in the authenticated data structures setting allows us to secure outsourced storage efficiently, i.e., to ensure that data has not been tampered with by the untrusted party that stores it. In this chapter, we follow a different direction where we are interested in verifying outsourced computation. Namely, how can one verify the outcome of a computation that has been performed by an untrusted entity? Of course, the main challenge in this paradigm is that the verification procedure should not involve executing the computation from scratch: This would defeat the purpose of employing a powerful (but untrusted) machine in the cloud to perform the computation for us. Motivated mainly by computations performed by search engines (e.g., keyword searches using an inverted index) as well as by database applications, in this chapter, we examine a very fundamental class of computations: We study the verification of outsourced operations on general sets, where a dynamic collection of m sets S1 , S2 , . . . , Sm is remotely stored at an untrusted server and we wish to publicly verify primitive queries on these sets, such as intersection, union and set difference. For example, for the query requesting the intersection 127 128 of t sets specified by indices i1 , i2 , . . . , it between 1 and m, we wish to design techniques that allow any client to cryptographically check the correctness of the returned intersection Si1 ∩ Si2 ∩. . .∩Sit . In addition, we wish the verification of any set operation be operation-sensitive, meaning that the required complexity depends only on the (description and outcome of the) operation, and not on the sizes of the involved sets. For example, if |Si1 ∩ Si2 ∩ . . . ∩ Sit | = δ, then we would like the verification cost to be proportional to t + δ. This achieves optimality, as the query and the answer require O(t + δ) complexity. Relation to outsourced verifiable computation. Recent works on outsourced verifiable computation by Gennaro et al. [41], Chung et al. [28] and Applebaum et al. [5] achieve operation-sensitive verification of general functionalities. Although such approaches completely cover set operations as a special case, clearly meeting our goal with respect to optimal verifiability, they are inherently inadequate to meet our other goals with respect to public verifiability and dynamic updates, both important properties in the context of data querying. Indeed, the works on outsourced verifiable computation [5, 28, 41] are primarily designed to provide secrecy of the outsourced computations, and as such, the client makes use of some secret information to outsource the computation as a circuit and in an encrypted form. This secret information is also used in the verifying computation, therefore effectively supporting only one verifier; instead, we seek for schemes that allow any client to query the sets collection and verify the returned results. Finally, in the outsourced verifiable computation framework [5, 28, 41], the description of the circuit is fixed at the initialization of the scheme, therefore effectively supporting no updates (or, very expensive updates as shown in Table 5.1 for [41]) in the outsourced data; instead, we seek for schemes supporting efficient updates. We accordingly study our problem in the model of authenticated data structures, which provides mechanisms for supporting public verifiability and queries on dynamic data. Achieving operation-sensitive verification. In this chapter, we design a new authenticated data structure scheme (denoted with ASC in Table 5.1) for the verification of set 129 operations in an operation-sensitive manner, that is, with proof and verification complexity depending only on the description and outcome of the operation and not on the size of the sets involved. Conceptually, this property is similar to the property of super-efficient verification that has been studied in certifying algorithms [63] and certification data structures [52, 107] (as well as in the context of outsourced verifiable computation [5, 28, 41]), where an answer can be verified in complexity asymptotically less than the complexity required to produce it. Whether the above optimality property is achievable for set operations (with linear storage) was posed as an open problem by Devanbu et al. [33]. We close this problem in the affirmative. All existing schemes for verifying outsourced set operations fall into the following two rather straightforward and highly inefficient solutions (for a detailed comparison see Table 5.1): Either short proofs for the answer of every possible set operation query are precomputed allowing for highly imbalanced schemes (exponential storage is required in order to achieve optimal verification, e.g., see the work by Pang and Tan [89]) or integrity proofs for all the elements of the sets participating in the query are given to the client who locally verifies the set operation (in this case verification complexity can be linear in the problem size, e.g., see the work by Devanbu et al. [33]). setup() update() refresh() query() verify() proof Π(q) info. upd publicly verifiable assumption [33, 112] [79] [89] t m+M m+M m +M log n + log m m+M mt log n + log m m+M mt n + log m n 1 n + log m n δ n + log m n δ 1 n mt yes yes yes Generic CR Strong RSA D. Log [41] ASC m+M m+M m+M 1 m+M 1 2 m + M n log n log log n + m log m δ δ δ δ m+M 1 no yes FHE B. q-DH Table 5.1: Asymptotic access and group complexities of various authenticated data structure schemes defined by algorithms {genkey, setup, update, refresh, query, verify}, for a sets collection data structure of m sets: The sum of sizes of all the sets is M and 0 <  < 1 is a constant. FHE stands for fully-homomorphic encryption, the security of which is based on lattice assumptions, such as the bounded distance decoding and the SplitKey distinguishing problems—see [43]. We note that the scheme based on FHE is not publicly-verifiable. It however provides privacy on top of integrity of computations. We show complexities for an intersection query on t = O(1) sets, outputting an intersection δ elements. All sizes of the intersected and updated sets are Θ(n). 130 131 Intuition of our construction. We achieve optimal verification complexity by departing from the above approaches as follows. We first reduce the problem of verifying set operations to the problem of verifying the validity of some more primitive relations on sets, namely subset containment and set disjointness. Then for each such primitive relation we employ a corresponding cryptographic primitive to optimally verify its validity. In particular, we extend the bilinear-map accumulator to optimally verify subset containment, inspired by [90]. We then employ the extended Euclidean algorithm over polynomials in combination with subset containment proofs to provide a novel optimal verification test for set disjointness. The intuition behind our technique is that disjoint sets can be represented by polynomials mutually indivisible, therefore there exist other polynomials so that the sum of their pairwise products equals to one—this is the test to be used in the proof. However, transmitting (and processing) these polynomials is bandwidth (and time)-prohibitive and does not lead to operation-sensitivity. Taking advantage of bilinearity properties, we can compress their coefficients in the exponent and still use them in a meaningful way, i.e., compute an internal product. This is why although using a conceptually simpler RSA accumulator [11] would lead to a mathematically sound solution, a bilinear-map accumulator [83] is essential for achieving the desired complexity goal. Related work for securing sets operations. Despite the fact that privacy-related problems for set operations have been extensively studied in the cryptographic literature (e.g., see the work by Boneh and Waters [20] and the work by Freedman et al. [39]), existing work on the integrity dimension of set operations appears mostly in the database literature. Devanbu et al. [33] identify the importance of coming up with an operation-sensitive scheme. In the work by Morselli et al. [79], possibly the closest in context work to ours, set intersection, union and difference are authenticated with linear verification and proof costs. Same linear asymptotic bounds are achieved by Yang et al. [112]. Pang and Tan [89] take a different 132 approach: In order to achieve operation-sensitivity, expensive pre-processing and exponential space are required (i.e., answers to all possible queries are signed). Finally, related to our work are non-membership proofs, both for the RSA [68] and the bilinear-map [8, 32] accumulators. We note here that the first part of the solution presented in this chapter uses a modification of the authenticated data structure scheme BHT presented in Chapter 3. 5.1 Preliminaries The data structure for which we design an authenticated data structure scheme for is called sets collection and is a generalization of the inverted index [9]. We describe it in detail in the following paragraph. 5.1.1 Sets collection data structure scheme The sets collection data structure consists of a collection of m sets, denoted with S = {S1 , S2 , . . . , Sm }, each containing elements from a universe U. Without loss of generality we assume that our universe U is the set of nonnegative integers in the interval [m + 1, p − 1], where p is k-bit prime, m is the number of the sets in our collection that has bit size O(log k), and where k is the security parameter1 . Every set Si is maintained to be sorted and does not contain duplicate elements; however an element x can appear in more than one set. The space usage of the sets collection is O(m + M ), where M is the sum of the sizes of the sets. Let now It be a collection of t indices, all between 1 and m. The data structure scheme {query(), update(), check()} (Definition 2.2) for a sets collection data structure T(S) supports various set operations over a collection S of dynamic sets and is defined as follows: 1. answer ← query(It , T(S), op): Depending on the input parameter op, a query on the sets collection data structure is one of the following standard set operations: 1 As we are going to see later in this chapter, we could have easily set our universe to be Zp by using CRHFs, but we choose not to do so in sake of a cleaner presentation. However, even with this constraint, our universe contains O(2k − poly(k)) = O(2k ) elements, since m is polynomially large. 133 • Intersection: Given indices It = {i1 , i2 , . . . , it }, return set I = Si1 ∩ Si2 ∩ . . . ∩ Sit as answer; • Union: Given indices It = {i1 , i2 , . . . , it }, return set U = Si1 ∪ Si2 ∪ . . . ∪ Sit as answer; • Subset: Given indices It = {i, j}, return true as answer if Si ⊆ Sj and false otherwise; • Set difference: Given indices It = {i, j}, return the set D = Si − Sj as answer. 2. T(S 0 ) ← update(x, i, T(S)): Given an element x ∈ U and 1 ≤ i ≤ m such that x∈ / Si , insert element x into Si and output T(S 0 ); Given an element x ∈ U such that x ∈ Si , delete element x from Si and output T(S 0 ). 3. {accept, reject} ← check(answer, op, T(S)): Output true if answer is the correct answer to the query on T(S) defined by op. Complexity. Let N be the sum of the sizes of the sets participating in the queries defined by algorithm query(). By using a generalized merge, all these queries can be answered with O(N ) complexity. Moreover, due to the requirement of keeping the sets sorted, all the updates require O(log N ) complexity. Also, for the remainder of the chapter, we denote with δ the size of the answer to a query operation, i.e., δ is equal to the size of I, U, or D. For a subset query, δ is O(1) (true/false). Sets collection as a hash table. We observe here that the sets collection data structure T(S) for the sets collection S = {S1 , S2 , . . . , Sm } can be viewed as a special hash table: Every set Si refers to a bucket Li of the hash table data structure scheme in Chapter 3. This construction does not have expected O(1) size for the buckets, since the sets can have arbitrary size. However, viewing the sets collection as a hash table—that uses a different function for distributing elements in the buckets—will allow as to employ scheme BHT from Chapter 3 as a black box for verifying set operations queries. 134 5.1.2 Subset witnesses Our construction uses bilinear maps and the bilinear-map accumulator, which were introduced in Section 3.1.3. We urge the reader to review the bilinear-map accumulator section (Section 3.1.3) before continuing. We begin with introducing an extra property that is going to be used in this context, the property of subset witnesses, which also appeared simultaneously (without a proof though) in the recent work of Canard and Gouget [25]. Assume that the bilinear-map parameters are in place, as described in Section 3.1.3. The proof for subset containment of a set S ⊆ X —for |S| = 1, this is a proof of membership—is the witness (WS,X , S) where WS,X = g Q x∈X −S (x+s) . A verifier can test subset containment for S by checking the relation e(WS,X , g (5.1) Q x∈S (x+s) ? )= e (acc(X ), g). We continue with the proof of security, which is a generalization of the membership proof presented by Nguyen [83]: Lemma 5.1 (Proving subsets) Let k be the security parameter and let (p, G, G, e, g) be a uniformly randomly generated tuple of bilinear pairings parameters. Given the elements q g, g s , . . . , g s ∈ G for some s chosen at random from Z∗p and a set of elements X in Zp (q ≥ |X |), suppose there is a polynomial-time algorithm that finds S and W such that S * X and e(W, g Q x∈S (x+s) ) = e(acc(X ), g). Then there is a polynomial-time algorithm for breaking the bilinear q-strong Diffie-Hellman assumption. Proof: Suppose there is a polynomial-time algorithm that computes such a set S = {y1 , y2 , . . . , y` }. Let X = {x1 , x2 , . . . , xn } and yj ∈ / X for some 1 ≤ j ≤ `. That means that e(W, g) Q y∈S (y+s) = e(g, g)(x1 +s)(x2 +s)...(xn +s) . Note that (yj +s) does not divide (x1 +s)(x2 +s) . . . (xn +s). Therefore there exist polynomial Q(s) of degree n − 1 and constant λ, such that (x1 + s)(x2 + s) . . . (xn + s) = Q(s)(yj + s) + λ. 135 Thus we have e(W, g)(yj +s) Q 1≤i6=j≤` (yi +s) Q 1≤i6=j≤` (yi +s) Q 1≤i6=j≤` (yi +s) e(W, g) e(W, g) 1 e(g, g) yj +s = e(g, g)Q(s)(yj +s)+λ ⇒ = e(g, g) λ j +s) Q(s)+ (y ⇒ λ = e(g, g)Q(s) e(g, g) (yj +s) ⇒ h   iλ−1 Q −Q(s) (yi +s) 1≤i6 = j≤` = e W, g e (g, g) . This means that the algorithm can be used to break the bilinear q-strong Diffie-Hellman assumption. 2 5.2 Construction and algorithms In the following, we recall that m denotes the number of the sets of our sets collection data structure and M denotes the sum of the sizes of the sets in our collections, i.e., M= m X |Si | . i=1 We now describe ASC = {genkey, setup, update, refresh, query, verify}, our authenticated data structure scheme for a sets collection data structure S1 , S2 , . . . , Sm . To do that, we are going to employ an extended version of the authenticated data structure scheme BHT = {genkey, setup, update, refresh, query, verify} described in Chapter 3: Instead of employing BHT on top of some m buckets created by using a two-universal hash function H on an elements collection X , we are going to employ it on top of the sets S1 ∪{1}, S2 ∪{2}, . . . , Sm ∪ {m}. By the constraint of our universe U, note that i ∈ / Si . Algorithm {sk, pk} ← genkey(1k ): The algorithm calls {sk, pk} ← BHT .genkey(). It outputs BHT .sk as sk and BHT .pk as pk. The access complexity is of this algorithm is O(1). 136 Algorithm {auth(D0 ), d0 } ← setup(D0 , sk, pk): The authenticated data structure auth(D0 ) is built as follows: First of all, the accumulation values of sets Si acc(Si ) = g Q x∈Si (s+x) for all i = 1, . . . , m , (5.2) are computed (see Section 3.1). Then the algorithm calls {auth(D0 ), d0 } ← BHT .setup(D0 , sk, pk) , without precomputed witnesses, and where D0 is the collection of m sets S1 ∪ {1}, S2 ∪ {2}, . . . , Sm ∪ {m} . Namely, the “bucket” Li in the scheme BHT is defined as the set Si ∪ {i} in this construction. The algorithm outputs both the authenticated data structure BHT .auth(D0 ) and the accumulation values acc(Si ) for all i = 1, . . . , m as auth(D0 ). Also it sets d0 to be BHT .d0 . Lemma 5.2 Algorithm setup() of the authenticated data structure scheme ASC has O(m + M ) access complexity. Moreover, the authenticated data structure auth(D0 ) output by setup() has O(m + M ) group complexity. Proof: When the scheme BHT is used with buckets of size O(1), the complexity of algorithm setup() as well as the group complexity of the output authenticated data structure, by Lemma 3.13, are both O(m). However in our case, since the size of the “buckets” sums to M ≥ m (and not to O(m) as it happens with the authenticated hash table), both these complexities are O(m + M ). 2 Algorithm {Dh+1 , auth(Dh+1 ), dh+1 , upd} ← update(u, Dh , auth(Dh ), dh , sk, pk): Suppose the update u is insert element x ∈ U into set Si . The algorithm initially sets acc(Si ) = acc(Si )x+s , if the update is an insertion. If the update is a deletion, the algorithm sets −1 acc(Si ) = acc(Si )(x+s) , 137 i.e., it updates the accumulation value that corresponds to the updated set. Then it calls {Dh+1 , auth(Dh+1 ), dh+1 , upd} ← BHT .update(u, Dh , auth(Dh ), dh , sk, pk). However, no rebuilding policy is applied here, as is done in BHT (therefore the complexity is not amortized). Information upd and auth(Dh+1 ) are set equal to BHT .upd and BHT .auth(Dh+1 ) respectively, both enhanced with the updated accumulated value acc(Si ). Lemma 5.3 Algorithm update() of the authenticated data structure scheme ASC has O(1) access complexity. Moreover, the update information upd output by update() has O(1) group complexity. Proof: The complexity bounds follow from Lemma 3.14 and by the fact that no rebuilding of the sets collection data structure is employed in this case. 2 Algorithm {Dh+1 , auth(Dh+1 ), dh+1 } ← refresh(u, Dh , auth(Dh ), dh , upd, pk): Suppose the update u is insert element x ∈ U into set Si . The algorithm calls the respective procedure from the authenticated data structure scheme BHT , i.e., it calls {Dh+1 , auth(Dh+1 ), dh+1 } ← BHT .refresh(u, Dh , auth(Dh ), dh , upd, pk) (again, no rebuilding policy is applied) and stores the new accumulation value acc(Si ) contained in upd. The updated authenticated data structure auth(Dh+1 ) is set equal to BHT .auth(Dh+1 ), enhanced with the updated accumulation value acc(Si ), contained in upd. Lemma 5.4 Algorithm refresh() of the authenticated data structure scheme ASC has O(1) access complexity. Proof: It follows directly from Lemma 3.17, and since we are not using precomputed witnesses and any rebuilding of the table. 2 5.3 Queries and verification In this section, we show how compact proofs for the answers to set queries (e.g., intersection, union) can be constructed using the authenticated sets collection data structure presented 138 earlier. The proofs have optimal size O(t + δ), where t is the size of the query parameters (e.g., t = 2 for an intersection of two sets) and δ is the answer size (e.g., δ = 1 if the intersection consists of one element). Our solutions use polynomial arithmetic, since the basis of our construction involves the bilinear-map accumulator. We begin with a result, to be used extensively by our methods, related to certifying algorithms [63]. Lemma 5.5 states that if the vector of coefficients a = [an , an−1 , . . . , a0 ] of a polynomial having roots x = [−x1 , −x2 , . . . , −xn ] is claimed to be correct, it can be certified, with high probability, with complexity less than O(n log n), i.e., without a fast Fourier transform computation (FFT) from scratch (see Lemma 3.15 from Chapter 3). This can be achieved with the following algorithm (not part of the ASC scheme): Algorithm {accept, reject} ← certify(a, x, pk): Pick a random κ ∈ Z∗p . If Qm i=1 (κ + xi ), then the algorithm outputs accept, else it outputs reject. Pn i=0 ai κi = Lemma 5.5 (Verification of polynomial coefficients) Let a = [an , an−1 , . . . , a0 ] and x = [x1 , x2 , . . . , xn ]. If accept ← certify(a, x, pk), then an , an−1 , . . . , a0 are the coefficients of the Q polynomial ni=1 (s + xi ) with probability Ω(1 − neg(k)). Moreover, algorithm certify(a, x, pk) has O(n) complexity. Proof: Algorithm certify() has complexity O(n) since it involves O(n) multiplications, additions and exponentiations. The probability that certify() accepts while a0 , a1 , . . . , an are not the coefficients of the polynomial that has roots −x1 , −x2 , . . . , −xn is equal to the probability P Q of κ being the root of the polynomial R(κ) = ni=0 ai κi − m i=1 (κ + xi ). This follows from polynomial equality that should hold for all κ. Now, polynomial R(κ) has degree n = poly(k) and has O(n) roots. Since κ is picked at random from Z∗p , it follows that this probability is bounded by O(poly(k)/2k ), which is neg(k), and therefore the validity of the coefficients can be verified with probability Ω(1 − neg(k)) with Θ(n) complexity. 2 In the following we describe the algorithms for the queries intersection, union, subset and set difference in detail. The parameters of our queries are t ≥ 2 indices (for subset and set 139 difference queries it is t = 2), namely the indices i1 , i2 , . . . , it , with 1 ≤ t ≤ m. To simplify the notation, we assume without loss of generality that these indices are 1, 2, . . . , t. We P denote with ni the size of set Si (i = 1, 2, . . . , t) and we define N = ti=1 ni . I.e., N is the total size of the sets involved in the execution of our queries. We repeat that δ denotes the size of our answer (e.g., size of the output intersection). Note, that in all cases δ = O(N ) and that performing the actual operations has O(N ) complexity, by using a generalized merge. 5.3.1 Intersection query We begin with the intersection query. Let I = S1 ∩ S2 ∩ . . . ∩ St = {y1 , y2 , . . . , yδ } be the intersection of sets S1 , S2 , . . . , St . We express the correctness of the answer I to the intersection query by means of the following two conditions: Subset condition: I ⊆ S1 ∧ I ⊆ S2 ∧ . . . ∧ I ⊆ St ; (5.3) Completeness condition: (S1 − I) ∩ (S2 − I) ∩ . . . ∩ (St − I) = Ø . (5.4) Note the completeness condition in Equation 5.4 is necessary since I should contain all the common elements. Given an intersection I, and for every set Sj , we define polynomial Q Pj (s) = x∈Sj −I (x + s), of degree nj . We can now state the following lemma: Lemma 5.6 Set I that is a subset of sets S1 , S2 , . . . , St is the intersection of sets S1 , S2 , . . . , St if and only if there exist polynomials q1 (s), q2 (s), . . . , qt (s) such that q1 (s)P1 (s)+q2 (s)P2 (s)+ . . . + qt (s)Pt (s) = 1. Moreover, computing polynomials q1 (s), q2 (s), . . . , qt (s) can be achieved with complexity O(N log2 N log log N ). Proof: (⇒) This direction follows by the fact that we can use the extended Euclidean algorithm and find polynomials q1 (s), . . . , qt (s) such that q1 (s)P1 (s) + . . . + qt (s)Pt (s) = GCD(P1 (s), P2 (s), . . . , Pt (s)). Since P1 (s), P2 (s), . . . , Pt (s) share no common factors, it follows that GCD(P1 (s), P2 (s), . . . , Pt (s)) = 1 . 140 (⇐) Suppose there exist polynomials q1 (s), q2 (s), . . . , qt (s) that satisfy relation q1 (s)P1 (s) + q2 (s)P2 (s) + . . . + qt (s)Pt (s) = 1 but I is not the intersection. This means that polynomials P1 (s), P2 (s), . . . , Pt (s) share at least one common factor, e.g., (s + r). Therefore there exists some polynomial A(s) such that (s + r)A(s) = 1, i.e., the polynomials (s + r)A(s) and 1 are equal, which is a contradiction (note that we want the polynomials to be equal for every s ∈ Zp ). In order to compute these coefficients, we use the extended Euclidean algorithm recursively, based on the fact that the greatest common divisor GCD(P1 (s), . . . , Pt (s)) equals GCD(P1 (s), GCD(P2 (s), . . . , Pt (s))). To compute the greatest common divisor of two O(n)degree polynomials, we can use the algorithm described in the book by von zur Gathen and Gerhard [40] that has O(n log2 n log log n) complexity. Since we are using this algorithm t times, the time complexity is bounded by O(tn log2 n log log n). Moreover, by the property that x log x + y log y ≤ (x + y) log(x + y) and since the size of the sets participating in the intersection is N , this equals O(N log2 N log log N ). This algorithm also outputs the required coefficients. If we arrange our data (i.e., t polynomials) on a binary tree, after all the coefficients of the internal nodes have been computed, the final coefficients for all elements at the leaves can be computed in O(t) multiplications (we can avoid the O(t log t) cost) of O(ni ) degree polynomials, where ni are the degrees of the polynomials of the leaves. Therefore the result holds. 2 We use Lemmata 3.15 and 5.6 to construct efficient proofs for both conditions in Relations 5.3 and 5.4: Proof of subset condition. For each set Sj , 1 ≤ j ≤ t, the subset witnesses WI,j = g Pj (s) are computed, as defined in Relation 5.1. =g Q x∈Sj −I (x+s) (5.5) 141 Proof of completeness condition. Suppose q1 (s), q2 (s), . . . , qt (s) are polynomials computed in Lemma 5.6 that satisfy q1 (s)P1 (s)+q2 (s)P2 (s)+. . .+qt (s)Pt (s) = 1. For j = 1, . . . , t, the completeness witnesses FI,j = g qj (s) (5.6) are computed. We can now formally define algorithms query() and verify() of the authenticated data structure scheme ASC and for the intersection query: Algorithm {Π(q), α(q)} ← query(q, Dh , auth(Dh ), pk): (intersection) The query q is the set of indices {1, 2, . . . , t}, requiring the intersection of S1 , S2 , . . . , St . Let α(q) = {y1 , y2 , . . . , yδ } be the intersection I. The proof Π(q) consisting of the following pieces: 1. Coefficients bδ , bδ−1 , . . . , b0 of the polynomial (s + y1 )(s + y2 ) . . . (s + yδ ); 2. Accumulation values proofs Πj = {(αji , βji ) : i = 0, . . . , l}, as defined in Relation 3.27, output by calling algorithm BHT .query(j, Dh , auth(Dh ), pk), for all j = 1, . . . , t; 3. Subset witnesses WI,j , as defined in Relation 5.5, for all j = 1, . . . , t; 4. Completeness witnesses FI,j = g qj (s) , as defined in Relation 5.6, for all j = 1, . . . , t. Algorithm {accept, reject} ← verify(q, α, Π, dh , pk): (intersection) Given a proof Π and an answer α = {y1 , y2 , . . . , yδ }, the verification algorithm for the intersection query S1 ∩ S2 ∩ . . . ∩ St outputs accept if all of the following tests are successful, else it outputs reject: 1. Coefficients test: It is accept ← certify([bδ , bδ−1 , . . . , b0 ], [−y1 , −y2 , . . . , −yδ ], pk);2 2. Accumulation values tests: For all j = 1, . . . , t, it is accept ← BHT .verify(j, true, Πj , dh , pk) ; 2 Algorithm certify() is used to achieve optimal verification complexity. 142 3. Subset tests: For all j = 1, . . . , t, it is e δ  Y g si b i ! , WI,j = e (βj0 , g) , (5.7) i=0 where βj0 is taken from Πj ; 4. Completeness test: It is t Y e (WI,j , FI,j ) = e(g, g) . (5.8) j=1 5.3.2 Union query The answer to a union query is the set U = S1 ∪ S2 ∪ . . . ∪ St = {y1 , y2 , . . . , yδ }. We express the correctness of the answer U to the union query by means of the following two conditions: Membership condition: ∀yi ∈ U ∃j ∈ {1, 2, . . . , t} : yi ∈ Sj ; Superset condition: (U ⊇ S1 ) ∧ (U ⊇ S2 ) ∧ . . . ∧ (U ⊇ St ) . (5.9) (5.10) Note that the superset condition is needed to make sure that no element has been excluded from the returned answer U. We now formally describe algorithms query() and verify() for the union query. Algorithm {Π(q), α(q)} ← query(q, Dh , auth(Dh ), pk): (union) The query q is the set of indices {1, 2, . . . , t}, requiring the union of S1 , S2 , . . . , St . Let α(q) = {y1 , y2 , . . . , yδ } be the union U. The proof Π(q) consisting of the following pieces: 1. Coefficients bδ , bδ−1 , . . . , b0 of the polynomial (s + y1 )(s + y2 ) . . . (s + yδ ); 2. Accumulation values proofs Πj = {(αji , βji ) : i = 0, . . . , l}, as defined in Relation 3.27, output by calling algorithm BHT .query(j, Dh , auth(Dh ), pk), for all j = 1, . . . , t; 3. Membership witnesses Wyi ,Sk (for some 1 ≤ k ≤ t), as defined in Relation 3.4, for all i = 1, . . . , δ; 4. Subset witnesses WSj ,U , as defined in Relation 5.1, for all j = 1, . . . , t. 143 Algorithm {accept, reject} ← verify(q, α, Π, dh , pk): (union) Given a proof Π and an answer α = {y1 , y2 , . . . , yδ }, the verification algorithm for the union query S1 ∪ S2 ∪ . . . ∪ St outputs accept if all of the following tests are successful, else it outputs reject: 1. Coefficients test: It is accept ← certify([bδ , bδ−1 , . . . , b0 ], [−y1 , −y2 , . . . , −yδ ], pk); 2. Accumulation values tests: For all j = 1, . . . , t, it is accept ← BHT .verify(j, true, Πj , dh , pk) ; 3. Membership tests: For all i = 1, . . . , δ, it is e (Wyi ,Sk , g yi g s ) = e (βk0 , g) , where βk0 is taken from Πk ; 4. Subset tests: For all j = 1, . . . , t, it is e WSj ,U , βj0  δ   Y i bi =e gs ,g ! , i=0 where βj0 is taken from Πj . 5.3.3 Subset query The correctness properties we need for the subset query are expressed with the relations S1 ⊆ S2 ⇔ ∀y ∈ S1 : y ∈ S2 . Algorithm {Π(q), α(q)} ← query(q, Dh , auth(Dh ), pk): (subset) The query q is the set of indices {1, 2} (wlog). Let α(q) = true if S1 ⊆ S2 or α(q) = false otherwise. The proof Π(q) consisting of the following pieces: 1. Accumulation values proofs Πj = {(αji , βji ) : i = 0, . . . , l}, as defined in Relation 3.27, output by calling algorithm BHT .query(j, Dh , auth(Dh ), pk), for j = 1, 2; 144 2. We distinguish two cases: (a) α(q) = true: The proof contains the subset witness WS1 ,S2 as defined in Relation 5.1; (b) α(q) = false: The proof contains a membership witness Wy,S1 (for some y) as defined in Relation 3.4 and a non-membership witness (Ay , By )—that proves that y∈ / S2 — as defined in Relation 3.5. Algorithm {accept, reject} ← verify(q, α, Π, dh , pk): (subset) Given a proof Π and an answer α ∈ {true, false}, the verification algorithm for the subset query S1 ⊆ S2 outputs accept if all of the following tests are successful, else it outputs reject: 1. Accumulation values tests: For j = 1, 2, it is accept ← BHT .verify(j, true, Πj , dh , pk); 2. (Non)-membership tests: When α = true, it is e(WS1 ,S2 , β10 ) = e(β20 , g), otherwise (α = false) it is e(Wy,S1 , g y g s ) = e(β10 , g) (verification of membership of y in S1 ) and e(g y g s , Ay )e(β20 , By ) = e(g, g) (verification of non-membership of y in S2 ), where β10 and β20 are taken from Π1 and Π2 . 5.3.4 Set difference query The correctness properties for a set difference query are expressed with the following relations. It is D = S1 − S2 ⇔ D ⊆ S1 ∧ S1 − D = S1 ∩ S2 . Algorithm {Π(q), α(q)} ← query(q, Dh , auth(Dh ), pk): (set difference) The query q is the set of indices {1, 2} (wlog), requiring the difference S1 − S2 . Let α(q) = {y1 , y2 , . . . , yδ } be the difference D. The proof Π(q) consisting of the following pieces: 145 1. Coefficients bδ , bδ−1 , . . . , b0 of the polynomial (s + y1 )(s + y2 ) . . . (s + yδ ); 2. Accumulation values proofs Πj = {(αji , βji ) : i = 0, . . . , l}, as defined in Relation 3.27, output by calling algorithm BHT .query(j, Dh , auth(Dh ), pk), for j = 1, 2; 3. Subset witness WD,S1 , as defined in Relation 5.1; 4. Subset witnesses WS1 −D,1 and WS1 −D,2 as defined in Relation 5.5; 5. Completeness witnesses FS1 −D,1 and FS1 −D,2 as defined in Relation 5.6. Algorithm {accept, reject} ← verify(q, α, Π, dh , pk): (set difference) Given a proof Π and an answer α = {y1 , y2 , . . . , yδ }, the verification algorithm for the difference query S1 − S2 outputs accept if all of the following tests are successful, else it outputs reject: 1. Coefficients test: It is accept ← certify([bδ , bδ−1 , . . . , b0 ], [−y1 , −y2 , . . . , −yδ ], pk); 2. Accumulation values tests: For all j = 1, 2, it is accept ← BHT .verify(j, true, Πj , dh , pk);   Q δ  si b i 3. Subset tests: It is e WD,S1 , i=0 g = e (β10 , g) and (a) e (WS1 −D,1 , WD,S1 ) = e(β10 , g), (b) e (WS1 −D,2 , WD,S1 ) = e(β20 , g), where β10 and β20 are taken from Π1 and Π2 ; 4. Completeness test: It is e (WS1 −D,1 , FS1 −D,1 ) e (WS1 −D,2 , FS1 −D,2 ) = e (g, g). This concludes the description of the verification and the query algorithms for all four set operations supported by the authenticated data structure scheme ASC. 146 5.4 Complexity Let now n1 , n2 , . . . , nt be the sizes of the involved sets in our queries and N = Pt i=1 ni . We have the following result: Lemma 5.7 For all queries q, algorithm query() of the authenticated data structure scheme ASC has O(N log2 N log log N + tm log m) access complexity. Moreover, it outputs a proof Π(q) of O(t + δ) group complexity. Proof: For all queries involving t sets S1 , S2 , . . . , St , accumulation proofs Π1 , Π2 , . . . , Πt have to be constructed, by using the authenticated data structure scheme algorithm BHT .query(). By Lemma 3.18 (no precomputed witnesses), this requires O(tm log m) complexity and the output proofs Π1 , Π2 , . . . , Πt have O(t) group complexity. Moreover: • Queries intersection, union and set difference require the computation of the coefficients bδ , bδ−1 , . . . , b0 of the polynomial that has roots −y1 , −y2 , . . . , −yδ . This task, by Lemma 3.15 has O(δ log δ) = O(N log N ) complexity, since δ ≤ N . Since bδ , bδ−1 , . . . , b0 ∈ Zp , their total group complexity is O(δ); • All queries require computing t subset witnesses (note that for the subset and set difference queries it is t = O(1)). By Lemma 3.15 and by the definition of subset witnesses in Relation 5.1, computing the subset witnesses has ! t X O (ni − δ) log(ni − δ) = O(N log N ) i=1 complexity. Since all subset witnesses are elements in G, their total group complexity is O(t). Moreover, for the union query, δ membership witnesses have to be computed. This, by Lemma 3.15 has complexity that is bounded above by O(N log N ). Also, these membership witnesses are elements in G, therefore their total group complexity is O(δ); • Queries intersection, subset and set difference require computing t completeness (or non-membership) witnesses (note that for the subset and set difference queries it is t = 147 O(1)), which involves running the extended Euclidean algorithm. By Lemma 5.6, this task has O(N log2 N log log N ) complexity. The group complexity of these witnesses is O(t), since they are elements in G. Summing up, we conclude that the proof has always group complexity O(t + δ) (hence, operation-sensitive) and the complexity to compute it is O(N log2 N log log N + tm log m) for all queries, except for the union proof, which requires slightly less complexity, i.e., O(N log N + tm log m). 2 Lemma 5.8 Algorithm verify() of the authenticated data structure scheme ASC has O(t+δ) access complexity. Proof: Algorithm certify() has O(δ) complexity, by Lemma 5.5. Also, the verification algorithm for all queries performs a number of constant-complexity operations—such as verification of proofs Πi with BHT .verify() (see Lemma 3.19) and bilinear-map computations—, that is proportional to t + δ. Therefore the access complexity of verify() is O(t + δ). 2 5.5 Proof of correctness Lemma 5.9 The authenticated data structure scheme ASC = {genkey, setup, update, refresh, query, verify} that uses the correct authenticated data structure scheme BHT from Chapter 3 is correct according to Definition 2.4. Proof: Let D0 be any sets collection data structure containing m sets. Fix the security 2 q parameter k and output pk = {h(.), (p, G, G, e, g), g s , g s , . . . , g s } and sk = s by calling algorithm genkey(). Then output an authenticated data structure auth(D0 ) and the respective digest d0 , by calling algorithm setup(). Pick a polynomial number of updates—namely, pick a polynomial number of elements for insertion (or deletion) into (or from) a set Sr —and update auth(D0 ) and d0 by calling algorithm refresh(). Let Dh be the final sets collection data structure, auth(Dh ) be the produced authenticated data structure and dh be the final 148 digest. By the way refresh() operates, at every time, the digest d(vj ) of a leaf node vj (that corresponds to set Sj ) of the tree T is d(vj ) = acc(Sj )(s+j) . (5.11) We prove correctness for all four query operations, i.e., for intersection, union, subset and difference. Intersection. Let our query q be {1, 2, . . . , t} (wlog), i.e., a set of indices that refers to the intersection of sets S1 , S2 , . . . , St . Algorithm query() outputs the proof Π(q) and the correct answer I = {y1 , y2 , . . . , yδ } = S1 ∩ S2 ∩ . . . ∩ St . The proof Π(q) for the intersection contains the following parts: 1. The coefficients bδ , bδ−1 , . . . , b0 of polynomial (s + y1 )(s + y2 ) . . . (s + yδ ) associated P with the intersection I = {y1 , y2 , . . . , yδ }. Since for every κ ∈ Zp it is δi=0 bi κi = Qδ i=1 (κ + yi ), Algorithm certify() accepts; 2. The proofs Πj , output by BHT .query(j, Dh , auth(Dh ), pk), for j = 1, . . . , t. We recall that each proof Πj is the ordered sequence (αji , βji ) for i = 0, . . . , l, as defined in −1 Relations 3.27. Specifically, by Relations 3.27 it should be βj0 = d(vj )(s+j) , which by Relation 5.11 gives βj0 = acc(Sj ) . (5.12) Now, by the correctness of the scheme BHT , BHT .verify(j, true, Πj , dh , pk), on inputs Πj output by BHT .query(j, Dh , auth(Dh ), always accepts (see Lemma 3.20); 3. The subset witnesses WI,j = g e Pj (s) =g δ  Y g si Q x∈Sj −I (x+s) bi for j = 1, . . . , t. The equality ! , WI,j = e(βj0 , g) , i=0 is always true, by the properties of the bilinear map, by Relation 5.12 and by the fact P Q that δi=0 bi si = δi=1 (s + yi ) (Item 1); 149 4. The completeness witnesses FI,j = g qj (s) for j = 1, . . . , t. The following equality t Y Pt e (WI,j , FI,j ) = e(g, g) j=1 qj (s)Pj (s) = e(g, g) , j=1 is always true: By construction of the completeness witnesses it should be t X qj (s)Pj (s) = 1 . j=1 This completes the proof of correctness for the case of intersection, since we proved that for every intersection query q and for every correct answer and proof output by query(), verify() always accepts. Union. Let our query q be {1, 2, . . . , t} (wlog), i.e., a set of indices that refers to the union of sets S1 , S2 , . . . , St . Algorithm query() outputs the proof Π(q) and the correct answer U = {y1 , y2 , . . . , yδ } = S1 ∪ S2 ∪ . . . ∪ St . The proof Π(q) for a union contains the following parts: 1. The coefficients bδ , bδ−1 , . . . , b0 and the proofs Πj . These are always verified as in the case of intersection. See Items 1 and 2 above; 2. The membership witnesses Wyi ,Sk for some k = 1, . . . , t, for each element yi (i = 1, . . . , δ). For i = 1, . . . , δ, it is e(Wyi ,Sk , g yi g s ) = e(βk0 , g), since Wyi ,Sk is the subset witness as defined in Relation 5.1 and βk0 = acc(Sk ), by Relation 5.12; 3. The subset witnesses WSj ,U , for all j = 1, . . . , t. For all j = 1, . . . , t it is ! δ  b i Y i e(WSj ,U , βj0 ) = e gs ,g , i=0 where WSj ,U is the subset witness of Sj with respect to U (the coefficients of which are b0 , b1 , . . . , bδ ), as defined in Relation 5.1 and U ⊇ Sj for all j = 1, . . . , t Therefore this relation also verifies, since βj0 = acc(Sj ) by Relation 5.12. This completes the proof of correctness for the case of union, since we proved that for every union query q and for every correct answer and proof output by query(), verify() always accepts. 150 Subset. Let the query be is S1 ⊆ S2 ? (wlog). Algorithm query() outputs the proof Π(q) and the correct answer, i.e., either true or false. The proof Π(q) for a subset query contains the following parts: 1. The accumulation values Π1 and Π2 . These are always verified as in the case of intersection. See Item 2 in the proof of correctness of the intersection operation; 2. Depending on whether we have a positive or a negative answer, we distinguish the following cases: • Positive answer, i.e., S1 is a subset of S2 . The proof contains the subset witness WS1 ,S2 . Then it is e(WS1 ,S2 , β10 ) = e(β20 , g), by the definition of WS1 ,S2 (see Relation 5.1), since S1 ⊆ S2 and since β10 = acc(S1 ) and β20 = acc(S2 ), by Relation 5.12; • Negative answer, i.e., S1 is not a subset of S2 . The proof contains an element y such that y ∈ S1 but y ∈ / S2 , the respective membership witness Wy,S1 and a non-membership proof (Ay , By ). It is e(Wy,S1 , g y g s ) = e(β10 , g), by definition of Wy,S1 in Relation 5.1, since y ∈ S1 and since β10 = acc(S1 ), by Relation 5.12. Also it holds e(g y g s , Ay )e(β20 , By ) = e(g, g), since Ay = g q(s) and By = g p(s) are Q such that (y + s)q(s) + p(s) x∈S2 (x + s) = 1, y ∈ / S2 and β20 = acc(S2 ), by Relation 5.12. This completes the proof of correctness for the case of the subset query, since we proved that for every subset query q and for every correct answer and proof output by query(), verify() always accepts. Set difference. Let our query q be S1 − S2 (wlog). Algorithm query() outputs the proof Π(q) and the correct answer D = S1 − S2 = {y1 , y2 , . . . , yδ }. The proof Π(q) for a difference query contains the following parts: 151 1. The coefficients bδ , bδ−1 , . . . , b0 (that relate to the difference {y1 , y2 , . . . , yδ }) and the proofs Π1 and Π2 . These are always verified as in the case of intersection. See Items 1 and 2 in the proof of correctness of the intersection query;   Q δ  si b i 2. The subset witness WD,S1 . Then it is e WD,S1 , i=0 g = e(β10 , g), by the definition of WD,S1 (see Relation 5.1), since D ⊆ S1 and since β10 = acc(S1 ) by Relation 5.12; 3. Note now that WD,S1 = g Q x∈S1 −D (x+s) . The remaining relations involving the subset witnesses WS1 −D,1 , WS1 −D,2 and the completeness witnesses FS1 −D,1 , FS1 −D,1 always verify since they comprise an intersection proof, i.e., the proof that S1 − D = S1 ∩ S2 and we have already shown the correctness of the intersection operation. This completes the proof of correctness for all the queries supported by sets collection, since we proved that for every query q (intersection/union/subset/difference) and for every correct answer and proof output by query(), verify() always accepts. 2 5.6 Proof of security Lemma 5.10 The authenticated data structure scheme ASC = {genkey, setup, update, refresh, query, verify} that uses the secure authenticated data structure scheme BHT from Chapter 3 is secure according to Definition 2.5 and under the bilinear q-strong Diffie-Hellman assumption. Proof: Let Adv be a computationally-bounded adversary, D0 be a sets collection data structure consisting of m sets S1 , S2 , . . . , Sm , ASC = {genkey, setup, update, refresh, query, verify} be our authenticated data structure scheme, k be the security parameter and {sk, pk} ← genkey(1k ). The adversary Adv is given the public key pk, namely he is given the values 2 q {h(.), (p, G, G, e, g), g s , g s , . . . , g s } and unlimited access to all the algorithms of ASC, except for setup() and update() to which he only has oracle access. The adversary initially outputs the authenticated data structure auth(D0 ) and the digest d0 , through an oracle call 152 to algorithm setup(). Then the adversary picks a polynomial number of updates (e.g., insert an element x into a set Sr ) and eventually outputs the data structure Dh , the authenticated data structure auth(Dh ) and the digest dh through oracle access to update(). Note that since dh , the digest of the authenticated data structure, is produced through oracle access to setup() and update(), it follows that it is the correct one. We now prove the security of each operation separately. For each operation we will express the probability of Definition 2.5 as the intersection of several events that we are going to define precisely below. Then, by using well-accepted assumptions already introduced, we are going to prove that this probability is negligible. Intersection. Let the intersection query be a set of indices {1, 2, . . . , t} (wlog). The adversary Adv outputs an incorrect answer I = {e1 , e2 , . . . , eδ } = 6 S1 ∩ S2 ∩ . . . ∩ St and also a proof that consists of the following elements: 1. Coefficients γδ , γδ−1 , . . . , γ0 ; 2. Proofs Π1 , Π2 , . . . , Πt ; 3. Subset witnesses W1 , W2 , . . . , Wt ; 4. Completeness witnesses F1 , F2 , . . . , Ft . We define now the following events, related to the choice of the proof above made by the adversary. Our goal will be to the express the probability of the security definition (Definition 2.5) as a function of the following events. • E1 : The values γ = [γδ , γδ−1 , . . . , γ0 ] and the answer e = {e1 , e2 , . . . , eδ } picked by Adv are such that accept ← certify(γ, e, pk). Event E1 can be partitioned into two mutually exclusive events E1,0 and E1,1 , i.e, E1 = E1,0 ∪ E1,1 : – E1,0 : The coefficients γδ , γδ−1 , . . . , γ0 are not the coefficients of the polynomial (s + e1 )(s + e2 ) . . . (s + eδ ); 153 – E1,1 : The coefficients γδ , γδ−1 , . . . , γ0 are the coefficients of the polynomial (s + e1 )(s + e2 ) . . . (s + eδ ). • E2 : The proofs Π1 , Π2 , . . . , Πt picked by Adv are accepted by algorithm BHT .verify(), i.e., it is accept ← BHT .verify(j, true, Πj , dh , pk), for all j = 1, . . . , t. Let (βj0 , αj0 ) (5.13) be the first element of the proof Πj . Event E2 can be partitioned into two mutually exclusive events E2,0 and E2,1 , i.e, E2 = E2,0 ∪ E2,1 : – E2,0 : There exists j ∈ {1, 2, . . . , t} such that βj0 6= acc(Sj ); – E2,1 : For all j = 1, . . . , t it is βj0 = acc(Sj ). • E3 : The values γδ , γδ−1 , . . . , γ0 , W1 , W2 , . . . , Wt and β10 , β20 , . . . , βt0 , which are contained in Π1 , Π2 , . . . , Πt , picked by Adv satisfy ! δ  γi Y i e gs , Wj = e (βj0 , g) for j = 1, . . . , t . i=0 Event E3 can be partitioned into two mutually exclusive events E3,0 and E3,1 , i.e, E3 = E3,0 ∪ E3,1 : – E3,0 : There exists j ∈ {1, 2, . . . , t} such that the opposites of the roots of the P polynomial δi=0 γi si are not a subset of Sj ; – E3,1 : The opposites of the roots of the polynomial Pδ i=0 γi si are a subset of Sj for all j = 1, . . . , t. • E4 : The values W1 , W2 , . . . , Wt and F1 , F2 , . . . , Ft picked by Adv satisfy Qt j=1 e (Wj , Fj ) = e(g, g); • F: The answer (intersection) I picked by Adv is not correct, i.e., I = {e1 , e2 , . . . , eδ } 6= S1 ∩ S2 ∩ . . . ∩ St . 154 Let now P be the probability of Definition 2.5, i.e., it is   k  {Q, Π, α, h} ← Adv(1 , pk); accept ← verify(Q, α, Π, dh , pk);  P = Pr  . reject ← check(Q, α, Dih ). We recall that the authenticated data structure scheme ASC is secure if P ≤ ν(k), where ν(k) is neg(k). We observe that for the case of the intersection query, P can be expressed as the probability of the intersection of the events E1 , E2 , E3 , E4 , F. By using simple probability calculus, this can be written as P = Pr [E1 ∩ E2 ∩ E3 ∩ E4 ∩ F] = Pr [(E1,0 ∪ E1,1 ) ∩ (E2,0 ∪ E2,1 ) ∩ (E3,0 ∪ E3,1 ) ∩ E4 ∩ F] ≤ Pr[E1,0 ] + Pr [E1,1 ∩ (E2,0 ∪ E2,1 ) ∩ (E3,0 ∪ E3,1 ) ∩ E4 ∩ F] ≤ Pr[E1,0 ] + Pr[E2,0 ] + Pr [E1,1 ∩ E2,1 ∩ (E3,0 ∪ E3,1 ) ∩ E4 ∩ F] ≤ Pr[E1,0 ] + Pr[E2,0 ] + Pr[E3,0 ∩ E2,1 ∩ E1,1 ] + Pr [E1,1 ∩ E2,1 ∩ E3,1 ∩ E4 ∩ F] ≤ Pr[E1,0 ] + Pr[E2,0 ] + Pr[E3,0 |E2,1 ∩ E1,1 ] + Pr [E4 |E3,1 |E2,1 ∩ E1,1 ∩ F] . We compute each such probability separately: 1. Pr[E1,0 ] is neg(k) by Lemma 5.5; 2. Pr[E2,0 ] is neg(k) by Corollary 3.4 (security of scheme BHT );3 3. Pr[E3,0 |E2,1 ∩ E1,1 ]: For this event we note that the event E3,0 is conditioned on the event E2,1 ∩ E1,1 . This condition allows us to replace βj0 with acc(Sj ) (due to E2,1 ) P Q and δi=0 γi si with x∈I (x + s) (due to E1,1 ) in the event E3,0 . Therefore the event E3,0 |E2,1 ∩ E1,1 is the event e g Q x∈I (x+s)  , Wj = e (acc(Sj ), g) ∧ I * Sj for some j ∈ {1, 2, . . . , t} . This event implies breaking the bilinear q-strong Diffie-Hellman assumption (Assumption 3.2), by Lemma 5.1. Therefore the probability Pr[E3,0 |E2,1 ∩ E1,1 ] is neg(k); 3 Note that in order to apply this corollary in the sets collection data structure, we have to consider that the respective “bucket” of the hash table representing the sets collection is Lj = Sj ∪ {j} and therefore Lj − {j} = Sj . 155 4. Pr[E4 |E3,1 |E2,1 ∩ E1,1 ∩ F]: For this event we note that the event E4 is conditioned on the event E3,1 |E2,1 ∩ E1,1 ∩ F. This condition allows us to replace βj0 with acc(Sj ) (due P Q to E2,1 ) and δi=0 γi si with x∈I (x + s) (due to E1,1 ) in the event E3,1 . Therefore, the event E3,1 |E2,1 ∩ E1,1 is the event e g Q x∈I (x+s)  , Wj = e (acc(Sj ), g) ∧ I ⊆ Sj for all j = 1, 2, . . . , t . This is equivalent to writing Wj as the subset witness WI,Sj , i.e., Wj = g Q x∈Sj −I (x+s) = g Pj (s) . (5.14) Note now that E4 is also conditioned on F. Therefore I is has to be incorrect. Specifically, since I ⊆ Sj for all j = 1, . . . , t (due to the condition on E31 ), it follows that I does not contain all the elements of the intersection, i.e., it is incomplete. Thus the polynomials P1 (s), P2 (s), . . . , Pt (s) (Relation 5.14) have at least one common factor, say (s + r) and it holds Pj (s) = (s + r)Qj (s) for some polynomials Qj (s)—computable in polynomial time—, for all j = 1, . . . , t. Therefore the event E4 |E3,1 | ∩ E2,1 ∩ E1,1 ∩ F implies that e(g, g) = = t Y e (Wj , Fj ) = t Y e g j=1 j=1 t Y (s+r) e g Qj (s) , Fj Pj (s)  , Fj = t Y e g (s+r)Qj (s) , Fj  j=1 = j=1 t Y !(s+r) e g Qj (s) , Fj  . j=1 Therefore we can derive an (s + r)-th root of e(g, g) as e(g, g) 1 s+r = t Y  e g Qj (s) , Fj . j=1 This implies breaking the bilinear q-strong Diffie-Hellman assumption for (p, G, G, e, g) (Assumption 3.2). By Assumption 3.2, this probability is neg(k), and therefore Pr[E4 |E3,1 |E2,1 ∩ E1,1 ∩ F] is neg(k). Thus the total probability P is neg(k). This concludes the proof for the security of an intersection query. 156 Union. Let the union query be a set of indices {1, 2, . . . , t} (wlog). The adversary Adv outputs an incorrect answer U = {e1 , e2 , . . . , eδ } 6= S1 ∪ S2 ∪ . . . ∪ St and also a proof that consists of the following elements: 1. Coefficients γδ , γδ−1 , . . . , γ0 ; 2. Proofs Π1 , Π2 , . . . , Πt ; 3. For each element ei ∈ U, membership witnesses Wi,j with reference to some set Sj , where 1 ≤ j ≤ t; 4. Subset witnesses W1 , W2 , . . . , Wt that prove that U is a superset of Sj , for all j = 1, 2 . . . , t. We define now the following events, related to the choice of the proof above made by the adversary. Our goal will be to the express the probability of the security definition as a function of the following events. • E1 , E1,0 , E1,1 : Same as in intersection; • E2 , E2,0 , E2,1 : Same as in intersection; • E3 : The values {e1 , e2 , . . . , eδ }, W1,j1 , W1,j2 , . . . , W1,jδ picked by Adv satisfy e (Wi,ji , g s g ei ) = e(βji 0 , g) for all i = 1, . . . , δ and ji ∈ {1, 2, . . . , t} , where βji 0 is the first element of proof Πji , as defined in Relation 5.13. Event E3 can be partitioned into two mutually exclusive events E3,0 and E3,1 , i.e, E3 = E3,0 ∪ E3,1 : – E3,0 : There exists i ∈ {1, 2, . . . , δ} such that ei ∈ / Sji ; – E3,1 : For all i = 1, 2, . . . , δ it is ei ∈ Sji . • E4 : The values W1 , W2 , . . . , Wt , β10 , β20 , . . . , βt0 (contained in Π1 , Π2 , . . . , Πt ) as well as the values γδ , γδ−1 , . . . , γ0 picked by Adv satisfy ! δ  γi Y i e (Wj , βj0 ) = e gs ,g . i=0 157 • F: The answer (union) U picked by Adv is not correct, i.e., U = {e1 , e2 , . . . , eδ } 6= S1 ∪ S2 ∪ . . . ∪ St . Similarly with the intersection security proof, let P be the probability of Definition 2.5. We observe that for the case of the union query, P can be expressed as the probability of the intersection of the events E1 , E2 , E3 , E4 , F. By using simple probability calculus (and similarly with the intersection security proof), this can be written as P = Pr [E1 ∩ E2 ∩ E3 ∩ E4 ∩ F] = Pr [(E1,0 ∪ E1,1 ) ∩ (E2,0 ∪ E2,1 ) ∩ (E3,0 ∪ E3,1 ) ∩ E4 ∩ F] ≤ Pr[E1,0 ] + Pr[E2,0 ] + Pr[E3,0 |E2,1 ] + Pr [E4 |E3,1 ∩ E2,1 ∩ E1,1 ∩ F] . We compute each such probability separately: 1. Pr[E1,0 ] is neg(k) by Lemma 5.5; 2. Pr[E2,0 ] is neg(k) by Corollary 3.4; 3. Pr[E3,0 |E2,1 ]: For this event we note that the event E3,0 is conditioned on the event E2,1 . This condition allows us to replace βj0 with acc(Sj ) in the event E3,0 . Therefore the event E3,0 |E2,1 is the event e (Wi,ji , g s g ei ) = e(acc(Sji ), g) ∧ ∃i ∈ {1, 2, . . . , δ} ∧ ji ∈ {1, 2, . . . , t} : ei ∈ / Sji . This event implies breaking the bilinear q-strong Diffie-Hellman assumption (Assumption 3.2), by Lemma 5.1. Therefore the probability Pr[E3,0 |E2,1 ] is neg(k); 4. Pr[E4 |E3,1 ∩ E2,1 ∩ E1,1 ∩ F]: For this event we note that the event E4 is conditioned on the event E3,1 ∩ E2,1 ∩ E1,1 ∩ F. This condition allows us to replace βj0 with acc(Sj ) P Q (due to E2,1 ) and δi=0 γi si with x∈U (x + s) (due to E1,1 ) in the event E4 . Therefore, the event E4 |E3,1 ∩ E2,1 ∩ E1,1 is the event e (Wj , acc(Sj )) = e g Q x∈U (x+s)  ,g . (5.15) 158 Note now that E4 is also conditioned on E3,1 . Thus it holds that all elements ei ∈ U belong to some Sji . Therefore the reported union cannot contain extra elements. Also, E4 is conditioned on F (incorrect union). Therefore the reported union must contain less elements and there should be an Sj (1 ≤ j ≤ t) that contains an r such that r ∈ / U. Therefore since Relation 5.15 holds, the adversary Adv can find P (s), Q(s) and α such that e (Wj , acc(Sj )) = e(Wj , g)(s+r)P (s) = e g Q x∈U (x+s)  , g = e(g, g)(s+r)Q(s)+α . Therefore we can derive an (s + r)-th of e(g, g) as 1 e(g, g) s+r = e(g, WSj )P (s)/α e(g, g)−Q(s)/α . This implies breaking the bilinear q-strong Diffie-Hellman assumption for the setting (p, G, G, e, g) (Assumption 3.2). By Assumption 3.2, this probability is neg(k), and therefore Pr[E4 |E3,1 ∩ E2,1 ∩ E1,1 ∩ F] is neg(k). Thus the total probability P is neg(k). This concludes the proof for the security of a union query. Subset. Let the subset query be S1 ⊆ S2 . For a positive answer, the adversary Adv outputs an incorrect answer false and also a proof that consists of the following elements: 1. Proofs Π1 and Π2 ; 2. A membership witness WS1 ,S2 with reference to set S2 . We define now the following events, related to the choice of the proof above made by the adversary. Our goal will be to the express the probability of the security definition as a function of the following events. • E2 , E2,0 , E2,1 : Same as in intersection, with the difference that we only refer to two sets, i.e., sets S1 and S2 ; • E3 : The values β10 (contained in Π1 ), β20 (contained in Π2 ) and WS1 ,S2 picked by Adv satisfy e (WS1 ,S2 , β10 ) = e (β20 , g). 159 • F: S1 * S2 . Similarly with the intersection security proof, let P be the probability of Definition 2.5. We observe that for the case of the positive subset query, P can be expressed as the probability of the intersection of the events E2 , E3 , F. By using simple probability calculus (and similarly with the intersection security proof), this can be written as P = Pr [E2 ∩ E3 ∩ F] = Pr [(E2,0 ∪ E2,1 ) ∩ E3 ∩ F] ≤ Pr[E2,0 ] + Pr [E3 |E2,1 ∩ F] . We compute each such probability separately: 1. Pr[E2,0 ] is neg(k) by Lemma 3.4; 2. Pr[E3 |E2,1 ∩ F]: For this event we note that the event E3 is conditioned on the event E2,1 ∩ F. This condition allows us to replace β10 with acc(S1 ) and β20 with acc(S2 ) in the event E3 . Therefore the event E3 |E2,1 ∩ F is the event e (WS1 ,S2 , acc(S1 )) = e (acc(S2 ), g) ∧ S1 * S2 . This event implies breaking the bilinear q-strong Diffie-Hellman assumption (Assumption 3.2), by Lemma 5.1. Therefore the probability Pr[E3 |E2,1 ∩ F] is neg(k); This concludes the security proof for the case of the positive subset query. For a negative answer, the adversary Adv outputs an incorrect answer true and also a proof that consists of the following elements: 1. Proof Π1 and Π2 ; 2. An element y; 3. A membership witness Wy for element y; 4. A non-membership witness Ay and By . 160 We define now the following events, related to the choice of the proof above made by the adversary. Our goal will be to the express the probability of the security definition (Definition 2.5) as a function of the following events. • E2 : Same as in the positive answer; • E3 : The values β10 , Wy and y picked by Adv are such that e(Wy , g s g y ) = e(β10 , g). Event E3 can be partitioned into two mutually exclusive events E3,0 and E3,1 , i.e, E3 = E3,0 ∪E3,1 : – y ∈ S1 ; – y∈ / S1 . • E4 : The values y, Ay , By and β20 picked by Adv are such that e(g y g s , Ay )e(β20 , By ) = e(g, g); • F: S1 ⊆ S2 . Similarly with the intersection security proof, let P be the probability of Definition 2.5. We observe that for the case of the negative subset query, P can be expressed as the probability of the intersection of the events E2 , E3 , E4 , F. By using simple probability calculus (and similarly with the intersection security proof), this can be written as P = Pr [E4 ∩ E3 ∩ E2 ∩ F] = Pr [E4 ∩ (E3,0 ∪ E3,1 ) ∩ (E2,0 ∪ E2,1 ) ∩ F] ≤ Pr[E2,0 ] + Pr[E3,0 |E2,1 ] + Pr [E4 |E3,1 ∩ E2,1 ∩ F] . We compute each such probability separately: 1. Pr[E2,0 ] is neg(k) by Lemma 3.4; 2. Pr[E3,0 |E2,1 ]: For this event we note that the event E3,0 is conditioned on the event E2,1 . This condition allows us to replace β10 with acc(S1 ) in the event E3,0 . Therefore the event E3,0 |E2,1 is the event e (Wy , g s g y ) = e (acc(S1 ), g) ∧ y ∈ / S1 . 161 This event implies breaking the bilinear q-strong Diffie-Hellman assumption (Assumption 3.2), by Lemma 5.1. Therefore the probability Pr[E3,0 |E2,1 ] is neg(k); 3. Pr [E4 |E3,1 ∩ E2,1 ∩ F]. Due to the condition on E3,1 ∩ E2,1 ∩ F this is the event e (g y g s , Ay ) e (acc(S2 ), By ) = e(g, g) ∧ S1 ⊆ S2 . Since we have the condition on E3,1 ∩ F (y ∈ S1 and S1 ⊆ S2 ), it must be that y ∈ S2 . By the security of the non-membership witness (Lemma 3.3), this implies breaking the q-strong Diffie-Hellman assumption (Assumption 3.2), which happens with probability neg(k). Thus the total probability P is neg(k). This concludes the proof for the security of a negative subset query. Set difference. Let the difference query be D = S1 − S2 . The adversary Adv outputs an incorrect answer D = {e1 , e2 , . . . , eδ } = 6 S1 − S2 and also a proof that consists of the following elements: 1. Coefficients γδ , γδ−1 , . . . , γ0 ; 2. Proofs Π1 and Π2 ; 3. A subset witness WD,S1 ; 4. A proof (WS1 −D,1 , WS1 −D,2 , FS1 −D,1 , FS1 −D,2 ) for the intersection S1 ∩ S2 . We define now the following events, related to the choice of the proof above made by the adversary. Our goal will be to the express the probability of the security definition (Definition 2.5) as a function of the following events. • E1 : Same as in intersection; • E2 : Same as in subset; 162 • E3 : The values γδ , γδ−1 , . . . , γ0 , WD,S1 and β10 (contained in Π1 ) picked by Adv satisfy ! δ  γi Y si e WD,S1 , g = e(β10 , g) . i=0 Event E3 can be partitioned into two mutually exclusive events E3,0 and E3,1 , i.e, E3 = E3,0 ∪ E3,1 : – E3,0 : D * S1 ; – E3,1 : D ⊆ S1 ; • E4 : The values WD,S1 , β10 , β20 , WS1 −D,1 , WS1 −D,2 , FS1 −D,1 , FS1 −D,2 picked by Adv are such that the respective tests for the intersection of S1 and S2 are satisfied, i.e., 1. e (WS1 −D,1 , WD,S1 ) = e(β10 , g); 2. e (WS1 −D,2 , WD,S1 ) = e(β20 , g); 3. e (WS1 −D,1 , FS1 −D,1 ) e (WS1 −D,2 , FS1 −D,2 ) = e (g, g). • F: The difference D is incorrect, i.e., D 6= S1 − S2 . Similarly with the intersection security proof, let P be the probability of Definition 2.5. We observe that for the case of the difference query, P can be expressed as the probability of the intersection of the events E1 , E2 , E3 , E4 , F. By using simple probability calculus (and similarly with the intersection security proof), this can be written as P = Pr [E4 ∩ E3 ∩ E2 ∩ E1 ∩ F] = Pr [E4 ∩ (E3,0 ∪ E3,1 ) ∩ (E2,0 ∪ E2,1 ) ∩ (E1,0 ∪ E1,1 ) ∩ F] ≤ Pr[E1,0 ] + Pr[E2,0 ] + Pr[E3,0 |E2,1 ∩ E1,1 ] + Pr[E4 |E3,1 ∩ E2,1 ∩ F] . We compute each such probability separately: 1. Pr[E1,0 ] is neg(k) by Lemma 5.5; 2. Pr[E2,0 ] is neg(k) by Lemma 3.4; 163 3. Pr[E3,0 |E2,1 ∩E1,1 ]. For the event E3,0 |E2,1 ∩E1,1 , by replacing the values of the conditions, we get e WD,S1 , g Q x∈D (x+s)  = e(acc(S1 ), g) ∧ D * S1 . This event implies breaking the bilinear q-strong Diffie-Hellman assumption (Assumption 3.2), by Lemma 5.1. Therefore the probability Pr[E3,0 |E2,1 ∩ E1,1 ] is neg(k); 4. Pr[E4 |E3,1 ∩ E2,1 ∩ F]. By the conditions, since D ⊆ S1 , we can write WD,S1 = g Q x∈S1 −D (x+s) . Therefore the event is equivalent to the conjunction of the following events:   Q • e WS1 −D,1 , g x∈S1 −D (x+s) = e (acc(S1 ), g);   Q • e WS1 −D,2 , g x∈S1 −D (x+s) = e (acc(S2 ), g); • e (WS1 −D,1 , FS1 −D,1 ) e (WS1 −D,2 , FS1 −D,2 ) = e(g, g). We have already proved (intersection proof) that the probability that the above event holds and S1 − D 6= S1 ∩ S2 is neg(k). However, the event S1 − D 6= S1 ∩ S2 is equivalent with the event D 6= S1 − S2 , which is our event F. Therefore the probability Pr[E4 |E3,1 ∩ E2,1 ∩ F] is neg(k). This completes the proof of security for all the queries of the sets collection data structure. 2 Theorem 5.1 Consider a collection of m sets S1 , . . . , Sm and let M = Pm i=1 |Si | and 0 <  < 1. For a query operation involving t sets (intersection/union/subset/difference), let N be the sum of the sizes of the involved sets and δ be the answer size. Let now k be the security parameter. Then there exists a publicly-verifiable authenticated data structure scheme ASC = {genkey, setup, update, refresh, query, verify} for a data structure scheme defined for dynamic sets collection data structure D such that: 164 1. It is correct and secure according to Definitions 2.4 and 2.5 and based on the bilinear q-strong Diffie-Hellman assumption; 2. The access complexity of setup() is O(m + M ), outputting an authenticated data structure auth(D) of O(m + M ) group complexity; 3. The access complexity of update() is O(1), outputting update information upd of O(1) group complexity; 4. The access complexity of refresh() is O(1); 5. For all queries q (intersection/union/subset/difference), the access complexity of query() is O(N log2 N log log N + tm log m), outputting a proof Π(q) of O(t + δ) group complexity; 6. For all queries (intersection/union/subset/difference), the access complexity of verify() is O(t + δ). Proof: The result follows from Lemmata 5.2, 5.3, 5.4, 5.7, 5.8, 5.9 and 5.10. 2 5.6.1 Protocols Three-party protocol. By using Theorem 2.1 we can easily derive the following corollary that describes the use of the authenticated data structure scheme ASC of Theorem 5.1 in the three-party model: Corollary 5.1 Consider a collection of m sets S1 , . . . , Sm and let M = Pm i=1 |Si | and 0 <  < 1. For a query operation involving t sets (intersection/union/subset/difference), let N be the sum of the sizes of the involved sets and δ be the answer size. Let now k be the security parameter and assume that the bilinear q-strong Diffie-Hellman assumption holds. Then there exists a three-party authenticated data structures protocol (see Protocol 2.1) for verifying intersection, union, subset and difference queries q on a dynamic sets collection data structure such that: 165 1. The setup at the source has O(m + M ) access complexity; 2. The update at the source has O(1) access complexity; 3. The space needed at the source has O(m + M ) group complexity; 4. The communication between the source and the server has O(1) group complexity; 5. The update at the server has O(1) access complexity; 6. For all queries (intersection/union/subset/difference), the query at the server has O(N log2 N log log N + tm log m) access complexity; 7. The space needed at the server has O(m + M ) group complexity; 8. For all queries (intersection/union/subset/difference), the communication between the server and the client has O(t + δ) group complexity; 9. For all queries (intersection/union/subset/difference), the verification at the client has O(t + δ) access complexity; 10. For a query q (intersection/union/subset/difference) sent by the client to the server at any time (even after updates), let α be an answer and let π be a proof returned by the server. With probability Ω(1 − neg(k)), the client accepts the answer α if and only if α is correct. Two-party protocol. Since the authenticated data structure scheme ASC uses the authenticated data structure scheme BHT , for which we have proved that Assumption 2.1 is true (see Corollary 3.6), by Theorems 2.2 and 5.1, we can state the final result for the two-party model: 166 Corollary 5.2 Consider a collection of m sets S1 , . . . , Sm and let M = Pm i=1 |Si | and 0 <  < 1. For a query operation involving t sets (intersection/union/subset/difference), let N be the sum of the sizes of the involved sets and δ be the answer size. Let now k be the security parameter and assume that the bilinear q-strong Diffie-Hellman assumption holds. Then there exists a two-party authenticated data structures protocol (see Protocol 2.2) for verifying intersection, union, subset and difference queries q on a dynamic sets collection data structure such that: 1. The protocol requires one round of interaction during updates; 2. The setup at the client has O(m + M ) access complexity; 3. The update at the client has O(1) access complexity; 4. For all queries (intersection/union/subset/difference), the verification at the client has O(t + δ) access complexity; 5. The space needed at the client has O(1) group complexity; 6. The communication between the client and the server has O(1) group complexity during updates and O(t + δ) group complexity during queries; 7. The update at the server has O(m log m) access complexity; 8. For all queries (intersection/union/subset/difference), the query at the server has O(N log2 N log log N + tm log m) access complexity; 9. The space needed at the server has O(m + M ) group complexity; 10. For a query q (intersection/union/subset/difference) sent by the client to the server at any time (even after updates), let α be an answer and let π be a proof returned by the 167 server. With probability Ω(1 − neg(k)), the client accepts the answer α if and only if α is correct. 5.7 Applications In this section we discuss on some applications of the presented authenticated sets collection data structure. 5.7.1 Keyword-search First of all, we notice that our scheme could be easily used to authenticate keyword-search queries implemented by the inverted index data structure [9]: Each term in the dictionary corresponds to a set in our sets collection data structure which contains as elements all the documents that include this term. A usual text query for terms m1 and m2 returns those documents that are included in both the sets that are represented by m1 and m2 , i.e., their intersection. By using our scheme, we can easily authenticate any such keyword-search query with costs that are proportional to the size of the answer of the query and not proportional to the amount of data that the algorithm reads in order to process the query. Moreover, the derived authenticated inverted index can be efficiently updated as well. We continue now with an extension of the authenticated inverted index, the timestamped keyword-search. 5.7.2 Timestamped keyword-search Apart from applications in web search engines, the inverted index is used in other applications that employ keyword-search as well, such as email-search. In email-search, a word dictionary is again maintained, the terms of which are mapped into sets of email messages that contain the specific term. Therefore when we are searching our inbox for emails containing terms m1 and m2 , an inverted index query is executed. However, it is always desirable in email search to be able to introduce a “second” dimension in searching. For example, a query could be 168 “give me the emails that contain terms m1 and m2 and which were received between time t1 and t2 ”, where t1 < t2 . We call this procedure timestamped keyword-search. One solution for the verification of timestamped keyword-search would be to embed a timestamp in the documents (e.g., each email message) and have the client do the filtering locally, after he has verified—using our scheme—the intersection of the sets that correspond to terms m1 and m2 . However, this is not operation-sensitive at all: The intersection can be a lot bigger than the set resulted after the application of the local filtering, making this straightforward solution inefficient. We now describe an algorithmic construction to solve this problem. Let t1 , t2 , . . . , tr be the discrete timestamps that we are interested in (ti can be viewed as a certain day of the month). We define a new sets collection data structure as follows: Imagine t1 , t2 , . . . , tr are the leaves of a binary tree. We build a segment tree [97] on top of these timestamps as follows: Each leaf storing timestamp ti contains the documents (e.g., email messages) that were received at time ti . Moreover, the internal nodes of the binary tree contain the documents that correspond to the union (note that this union does not have any common elements) of the documents contained in the children’s nodes, recursively defining in this way sets of documents for all the nodes of the tree. Therefore we end up with a new sets collection data structure that is built on top of these 2r − 1 sets (one set per internal tree node of the tree), namely the sets T1 , T2 , . . . , T2r−1 . The timestamped keyword-search is therefore verified by two sets collection data structures, one built on the text terms, namely the sets S1 , S2 , . . . , Sm , and one built on top of the sets of the timestamps, namely the sets T1 , T2 , . . . , T2r−1 . Define now the extension of two timestamps ext(t1 , t2 ) to be the set of sets Ti that “cover” the interval [t1 , t2 ], i.e., namely the set that contains sets the union of which equals the set of all timestamps in [t1 , t2 ]. One can easily see that for every 1 ≤ t1 ≤ t2 ≤ r, it is |ext(t1 , t2 )| = O(log r). Suppose now we want to verify the documents that contain terms m1 and m2 and which were received between t1 and t2 . Namely our query is described by the parameters 169 m1 , m2 , t1 , t2 (in the general case our query is described by t terms m1 , m2 , . . . , mt and two timestamps t1 and t2 —see Corollary 5.3). All we have to do is to verify the intersection of the following sets: (a) the union of sets in ext(t1 , t2 ), (b) S1 (set that refers to term m1 ) and, (c) S2 (set that refers to term m2 ). Let T1 , T2 , . . . , T` be the disjoint sets that are contained in ext(t1 , t2 ), where ` = O(log r). The answer to the query is the set (S1 ∩S2 )∩(T1 ∪T2 ∪. . .∪T` ) which can be written as (S1 ∩S2 ∩T1 )∪(S1 ∩S2 ∩T2 )∪. . .∪(S1 ∩S2 ∩T` ). Since Ti are disjoint, each term of the union contributes at least one new term to the answer, and therefore we can verify this query in a nearly operation-sensitive way by authenticating log r intersections separately (note there is an extra O(log r) multiplicative factor in the complexities of Corollary 5.3). Corollary 5.3 Consider a collection of m sets S1 , . . . , Sm , let M = Pm i=1 |Si |, 0 <  < 1 and t1 , t2 , . . . , tr be discrete timestamps. For a query operation involving in a time interval [t1 , t2 ], let t be the number of involved sets, N be the sum of the sizes of the involved sets, and δ be the answer size. There exists an authenticated data structure scheme T KS = {genkey, setup, update, refresh, query, verify} for a data structure scheme defined for a timestamped keyword-search data structure D with the following properties: 1. It is correct and secure according to Definitions 2.4 and 2.5 and based on the bilinear q-strong Diffie-Hellman assumption; 2. The access complexity of setup() is O(m + r + M ), outputting an authenticated data structure auth(D) of O(m + M + r) group complexity; 3. The access complexity of update() is O(log r), outputting information upd of O(1) group complexity; 4. The access complexity of refresh() is O(log r); 5. For a time-stamped keyword-search query q, algorithm query() has O(N log2 N log log N + t(m + r) log(m + r)) access complexity, outputting a proof Π(q) of O(t log r + δ) group 170 complexity; 6. For a time-stamped keyword-search query, the access complexity of verify() is O(t log r+ δ). Note that in the above theorem we do not have a result concerning the verification of union with timestamps. This is due to the following: Using the same notation as we did for the intersection, the answer to the union query, would be the set (S1 ∪S2 )∩(T1 ∪T2 ∪. . .∪T` ). The nature of the answer does not allow for any further algebraic processing and therefore in order to authenticate the whole expression, one needs to verify the two unions separately. This leads to a solution that is not operation-sensitive (we recall that the size of out query is O(t)), therefore the operation-sensitive verification of this type of queries cannot be achieved with our method—at least in a way similar to the techniques we have used so far. The same applies for the difference queries. 5.8 Analysis In this section we analyze the costs needed by our solution and compare with experimental results from other works. For bilinear maps and generic-group operations in the bilinearmap accumulator, we used the PBC library [1], a library for pairing-based cryptography, interfaced with C. 5.8.1 System setup We choose our system parameters as follows. First of all, type A pairings are used, as described in [70]. These pairings are constructed on the curve y 2 = x3 + x over the base field Fq , where q is a prime number. The multiplicative cyclic group G we are using is a subgroup of points in E(Fq ), namely a subset of those points of Fq that belong to the elliptic curve E. Therefore this pairing is symmetric. The order of E(Fq ) is q + 1 and the order of the 171 group G is some prime factor p of q + 1. The group of the output of the bilinear map G is a subgroup of Fq2 . In order to instantiate type A pairings in the PBC library, we have to choose the size of the primes q and p. The main constraint in choosing the bit-sizes of q and p is that we want to make sure that discrete logarithm is difficult in G (that has order p) and in Fq2 . Typical values are 160 bits for p and 512 bits for q. We use the typical value for the size of q, i.e., 512 bits. Note that with this choice of parameters the size of the elements in G (which have the form (x, y), i.e., points on the elliptic curve) is 1024 bits. Finally, let’s assume that the accumulation tree that is built on top of the set digests, has two levels, i.e.,  = 0.5. 5.8.2 Communication cost Here we analyze the communication cost that our scheme has for an intersection of two sets. Let’s assume that the size of the reported intersection is δ. According to the described query() algorithm for the intersection, the proof (apart from the answer itself), consists of the following values: (a) Two subset witnesses, two completeness witnesses and two proofs (each one of the proofs consist of two proof elements of two group elements each). Therefore the size of all these elements, which are all elements of group G, is not dependent on the size of the intersection and is equal to 2 × (1024 + 1024 + 4 × 1024)/8 = 1536 bytes; (b) The coefficients bi ∈ Zp (we recall p is 160 bits long) of the intersection, for i = 1, . . . , δ. These have size 160δ/8 = 20δ bytes. Therefore the total communication cost is a linear function of δ, i.e., the function 1536 + 20δ (in bytes). We now compare the communication cost of our scheme with the analysis made in [79]. In Table 5.2 we compare with the results presented in Table IV of [79] where various set sizes n1 and n2 are used and the size of the intersection δ is always 0.01n2 . Note that in most cases, our communication cost is a lot less than the one reported in [79]. More importantly, it is not dependent on the size of the sets participating in the intersection. In cases that our cost is worse, it is due to the big constants enforced by the use of bilinear pairings and accumulators. 172 Table 5.2: Comparison of a 2-intersection communication overhead (proof size) of the scheme presented by Morselli et al. [79] with our scheme. Here n1 and n2 are the sets sizes that are intersected and δ is the size of the intersection. n1 n2 δ KB [79] KB (this work) 1000 1000 10 3.34 1.73 1000 100 1 1.68 1.55 1000 10 0 1.01 1.53 1000 1 0 0.46 1.53 10000 10000 100 26.88 3.53 10000 1000 10 12.15 1.73 10000 100 1 6.86 1.55 10000 10 0 3.08 1.53 100000 100000 1000 263.25 21.53 100000 10000 100 116.13 3.53 100000 1000 10 63.18 1.73 100000 100 1 26.69 1.55 5.8.3 Verification cost Let exp, mult, add be the times needed to perform an exponentiation, a multiplication and an addition respectively, all modulo p. Let also EXP, MULT be times required for exponentiation and multiplication in group G and let EX P, MULT be the respective times in the target group of the bilinear map G. Finally let MAP be the time needed to perform the operation e(., .). We benchmarked all these operations using the PBC library [1] (version pbc − 0.5.7), on a 64-bit, 2.8GHz Intel based, dual-core, dual-processor machine with 4GB main memory, running Debian Linux, and derived the following times, i.e., MAP = 5ms, MULT = 0.005ms, exp = 0.02ms, add = 0.002ms and mult = 0.002ms. We analyze now the verification cost of a 2-intersection, required by our scheme. Let Si and Sj be the sets of the intersection. The verification algorithm, on input the proof has to perform the following tasks: (a) First it verifies the proofs Πi and Πj , which requires two bilinear-map computations for each value, therefore taking time 4MAP; (b) Then algorithm certify() is executed. The time needed for this part is δ(2mult + 2add + exp); (c) Then the algorithm checks the subset condition which takes time 4MAP; (d) Finally it checks the completeness condition that takes times 2MAP + MULT . Therefore we see that the total 173 cost for verification of a 2-intersection of size δ is 10MAP + δ(2mult + 2add + exp) + MULT , which is a linear function in δ, namely the function 50 + 0.028δ (in ms). Chapter 6 Optimality with multilinear forms In the previous chapters of this thesis, we introduced authenticated data structures schemes based on several well-accepted cryptographic primitives such as accumulators, bilinear maps and lattices. Some of these schemes present desirable efficiency characteristics, such as operation sensitivity and parallel algorithms, which would not be achievable with traditional hash-based techniques. However, none of the authenticated data structure schemes presented so far is optimal (i.e., adding no extra asymptotic overhead to the respective plain data structure scheme), according to our natural definition of optimality given in Section 2 (see Definition 2.8). One authenticated data structure scheme that is almost optimal is the scheme ASC, used for verifying set operations and presented in Chapter 5: Although the verification and communication costs were showed to be optimal (O(t + δ)), the query costs were increased by a polylogarithmic factor (see Theorem 5.1). As such, in this chapter, we pose the following natural question: Can we construct an optimal authenticated data structure scheme? The answer is yes, but, assuming the existence of a cryptographic primitive that does not exist yet! Moreover, and less importantly, the derived authenticated data structure scheme is not publicly verifiable, thus not allowing its use by a three-party protocol (Protocol 2.1). This shows the complication of the problem of achieving optimality in authenticated data structures. 174 175 To show the realization of an optimal authenticated data structure, we present an authenticated dictionary data structure that is based on a new cryptographic primitive that was proposed by Silverberg and Boneh, namely multilinear forms [19], the construction of which remains however an open problem to date. The use of such a primitive gives an authenticated dictionary with constant communication and constant verification complexity, while maintaining all other complexities logarithmic. To the best of our knowledge (see Table 6.1), this is the first optimal authenticated dictionary to appear in the literature, as it exactly matches the respective complexities (update, query, and communication complexity) of the optimal dictionary data structure (e.g., implemented as a red-black tree). The multilinear form cryptographic primitive that is used in our construction can be described as the “multi” version of the well-known bilinear map. Although initially used to attack elliptic curve systems [76], bilinear maps (also extensively used in the previous chapters of the thesis), being literally an efficient a tool for solving the decisional DiffieHellman problem, eventually proved to be a very useful tool in cryptography (e.g., [16, 17, 18]) after their first appearance in the literature for a “good purpose” [60]. However, the main limitation of bilinear maps is the fact that they cannot be applied twice, i.e., the output element cannot be fed back into the map e(., .) in an efficient way. Finding such maps, i.e., self-bilinear maps, which could be used in a recursive way to construct multilinear forms, was recently proved to be infeasible for groups that are of interest in cryptography, i.e., groups where the computational Diffie-Hellman problem is hard [27]. However, since cryptographically interesting multilinear form generators1 are not known to exist to date, one can view our work from a different (and more theoretical) angle: A proof through a complexity lower bound of the nonexistence of optimal authenticated dictionaries would imply the nonexistence of cryptographically interesting multilinear form generators (see Theorem 6.2). This reveals yet another important relation between two 1 I.e., multilinear form generators for groups where the discrete log problem is hard, e.g., elliptic curve groups. We call these generators admissible later in the paper. 176 fields—combinatorics and cryptography—and becomes more promising (towards proving nonexistence of cryptographically interesting multilinear form generators) given recent advances in the derivation of general complexity lower bounds for memory checking [35] and authenticated data structures [106]. About mulitlinear forms. Multilinear forms were proposed as a possible useful tool in cryptography in 2003 by Silverberg and Boneh [19]. Since then, no efficient construction of interest in cryptography has appeared. A work similar in nature with ours, where an efficient construction for a cryptographic application based on multilinear forms is presented, is proposed by Lee et al. [64]. The impossibility of deriving multilinear forms through selfbilinear maps is investigated by Cheon and Lee [27]. setup() update() refresh() query() verify() proof Π(q) info. upd publicly verifiable optimal assumption [15, 48, 75, 81] [11] [83] n n n log n 1 1 log n 1 n log n n 1 log n n 1 log n n 1 1 1 1 yes yes yes no no no Generic CR D. Log B. q-DH [23, 101] [51] n n 1 n n log n n 1 n 1 1 1 1 1 n yes yes no no Strong RSA [90] n 1 1 n 1 1 1 yes no MFD n log n log n log n 1 1 log n no yes M. q-DH Generic CR Table 6.1: Asymptotic access and group complexities of various authenticated data structure schemes for a dynamic dictionary storing n elements, compared with the optimal authenticated dictionary MFD based on multilinear forms and derived in this chapter. Parameter 0 <  < 1 is a constant and “M. q-DH” stands for “Multilinear q-strong Diffie-Hellman”. The various acronyms used for variables and assumptions have all been defined in Table 3.1. Note that our construction requires two assumptions, namely the assumptions M. q-DH and Generic CR. 177 178 6.1 Dictionary data structure In this chapter, the underlying data structure we are using (and for which we are designing an authenticated data structure scheme for) is a dictionary. Let X be a collection of n elements from a totally-ordered universe U. Note that the total order is a requirement for the dictionary data structure, a property that distinguishes it from a hash table (and thus the difference in their complexities). The data structure scheme {update, query, check} as defined in Definition 2.2 for a dictionary D(X ) is as follows: 1. y ← query(a, b, D(X )): Given two elements a, b ∈ U, with a ≤ b, return the sorted list of successive elements y = [y1 y2 . . . yw−1 yw ] ⊆ X such that y1 ≤ a ≤ y2 ≤ . . . ≤ yw−1 ≤ b < yw . This is a general range search query. Note that for a = b this query reduces to a membership (or non-membership) query for a outputting the interval of X containing (or not) a. Answering a range search query can be implemented to have O(log n + w) worst-case complexity with a red-black tree data structure [29], outputting an answer of size O(w); 2. D(X 0 ) ← update(x, D(X )): Given an element x ∈ U such that x ∈ / X , insert element x into X and output D(X 0 ); Given an element x ∈ U such that x ∈ X , delete element x from X and output D(X 0 ). Both insertions and deletions can be implemented to have O(log n) worst-case complexity [29]; 3. {accept, reject} ← check(a, b, y, D(X )): If y = [y1 y2 . . . yw−1 yw ] ⊆ X is a sorted list of w successive elements such that y1 ≤ a ≤ y2 ≤ . . . ≤ yw−1 ≤ b < yw , return accept. Else return reject. 179 6.1.1 Non-optimal authenticated dictionaries To verify range search queries on a dictionary of n elements, we can use many authenticated data structure schemes extensively described in the literature, e.g., various hierarchical hashing constructions [15, 48, 75, 81, 92], the security of which is based on generic collision-resistant hashing. Let HBD = {genkey, setup, update, refresh, query, verify} be such a scheme, described below in Corollary 6.1. All these schemes, are however, non-optimal: When the output range is of size w = o(log n), the output proof complexity as long as the verification complexity are both Ω(log n), not satisfying in this way the definition of optimality (see Definition 2.8). Nevertheless, for reasons that will become clear later, we are using such an authenticated dictionary in our construction: Corollary 6.1 Let k be the security parameter. Then there exists a non-optimal, publiclyverifiable authenticated data structure scheme HBD = {genkey, setup, update, refresh, query, verify} for a data structure scheme defined for a dynamic dictionary D storing n elements such that: 1. It is correct according to Definition 2.4 and secure according to Definition 2.5 and assuming the existence of generic collision-resistant hash functions; 2. The access complexity of setup() is O(n), outputting an authenticated data structure auth(D) of O(n) group complexity; 3. The access complexity of update() is O(log n), outputting update information upd of O(1) group complexity; 4. The access complexity of refresh() is O(log n); 5. For a range search query q outputting an answer of size w we have: (a) The access complexity of query() is O(log n + w); (b) The access complexity of verify() is O(log n + w); 180 (c) The group complexity of the proof Π(q) is O(log n + w). We continue with describing the cryptographic primitive to be used in our construction, the multilinear form: 6.2 Multilinear forms Let G, G be two cyclic groups of prime order p and let g be a generator of G. We let the bit-size of p (the order of both G and G) to be a polynomial in the security parameter k. We are now ready to define an admissible t-multilinear form. The definition is similar to the one presented in the original paper by Silverberg and Boneh [19]: Definition 6.1 We say that a map e : Gt → G is an admissible t-multilinear form if it satisfies the following properties: 1. G and G are cyclic groups of the same prime order p; 2. The discrete logarithm problem is hard both in G and G; 3. For all a1 , a2 , . . . , at ∈ Z∗p and x1 , x2 , . . . , xt ∈ G it is e(xa11 , xa22 , . . . , xat t ) = e(x1 , x2 , . . . , xt )a1 a2 ...at ∈ G; 4. The map is non-degenerate: If g ∈ G generates G then e(g, g, . . . , g) ∈ G generates G. We call the groups G and G for which there exists an admissible t-multilinear form admissible t-multilinear groups. Definition 6.2 An admissible t-multilinear form generator is a probabilistic polynomialtime algorithm that takes as input a natural number t and the security parameter 1k and outputs a uniformly random tuple of multilinear pairing parameters (p, G, G, e, g), where G and G are admissible t-multilinear groups for which there exists an admissible t-multilinear form e(., . . . , .) : Gt → G of t inputs. 181 To prove security of our construction, we are going to use the following assumption, which can be described as the “multi” version of the bilinear q-strong Diffie-Hellman assumption (see Assumption 3.2): Assumption 6.1 (Multilinear q-strong Diffie-Hellman assumption) Let k be the security parameter and let (p, G, G, e, g) be a uniformly randomly generated tuple of multilinear pairings parameters, output by an admissible t-multilinear form generator. Given the eleq ments g, g s , . . . , g s ∈ G for some s chosen at random from Z∗p , where q = poly(k), there is no polynomial-time algorithm that can output the pair (a, e(g, g, . . . , g)1/(s+a) ) ∈ Zp × G except with negligible probability neg(k). 6.3 An optimal authenticated dictionary In this section, we describe an authenticated dictionary data structure scheme based on admissible multilinear form generators that achieves communication and verification complexity that is proportional to the size of the reported output range w. More importantly, these complexities are combined with logarithmic update and query costs. Let X = {x1 , x2 , . . . , xn } be a set of elements from a totally-ordered universe U contained in the dictionary, where x1 < x2 < . . . < xn . Each element is represented with k/2 bits. The actual set we are going to store, in order to also support efficient range search and non-membership queries, is the set of k-bit intervals, i.e., the set A = {a0 , a1 , a2 , . . . , an }, where, for i = 1, . . . , n − 1, it is ai = xi ||xi+1 , for i = 0, it is a0 = −∞||x1 , and, for i = n, it is an = xn || + ∞. Note that the total order on X imposes a natural total order on A. In our construction we use a red-black tree, with data at the leaves (i.e., internal nodes data navigates the searches and does not correspond to actual data) [29]. We now describe MFD = {genkey, setup, update, refresh, query, verify}, our authenticated data structure scheme for a dictionary data structure on the totally-ordered set 182 X = {x1 , x2 , . . . , xn }. We note that the construction uses several features from the accumulators constructions in Chapter 3, as long as the authenticated data structure scheme for a dictionary HBD from Corollary 6.1. Again, the actual set on which we are going to build our data structure is the set of intervals A = {a0 , a1 , a2 , . . . , an }. Algorithm {sk, pk} ← genkey(1k ): Using a k-admissible multilinear form generator from Definition 6.2, the algorithm outputs a k-bit prime p, k-admissible multilinear groups G and G, g ∈ G that generates G and a k-admissible multilinear form e(., . . . , .) : Gk → G 2 . Then it randomly picks a number s ∈ Z∗p (s is the trapdoor). An upper bound q of the total number of elements to be stored in the data structure is decided and the algorithm also computes 2 q the elements of G g s , g s , . . . , g s . It also calls {sk, pk} ← HBD.genkey(1k ). The algorithm outputs s ∈ Z∗p and HBD.sk as sk and everything else as pk. Algorithm {auth(D0 ), d0 } ← setup(D0 , sk, pk): Let D0 = A = {a0 , a1 , a2 , . . . , an } be the set of sorted intervals, that corresponds to the underlying set of elements X = {x1 , x2 , . . . , xn }. Initially, the algorithm computes acc(A) = g (a0 +s)(a1 +s)...(an +s) ∈ G . (6.1) Let now T be the red-black tree built on top of the intervals ai for i = 0, . . . , n. Note that there is a natural notion of order imposed on A, based on the order imposed on X . Let v0 , v1 , . . . , vn be the leaves of the tree, storing the intervals a1 , a1 , . . . , an respectively. We define the label of vi as label(vi ) = g ai +s ∈ G for all i = 0, . . . , n . (6.2) Also, let vA be the internal node of T that is the root of the subtree TA of T that contains the elements of some set A ⊆ A. For every internal node vA (and for the root of the tree as 2 Note that the number of inputs of the multilinear form is equal to the security parameter k. This is because our construction will require O(log n) inputs for the multilinear form, and, since we are in the computational model, it is always log n < k. 183 well), the algorithm sets label(vA ) = g Q a∈A (a+s) ∈ G. (6.3) All the labels label() are stored with tree T . Subsequently, the algorithm calls the algorithm HBD.setup(A, sk, pk), building in this way a new authenticated dictionary based on hashing on top of A. It sets d0 = {acc(A), hash(A)}, where hash(A) = HBD.d0 . The structure auth(D0 ) contains the tree T , as well as the authenticated structure HBD.auth(A). We now make the following important remark: Remark 6.1 The hashing scheme employed by HBD includes in the hashing computation all the labels label(v) (defined in Relation 6.3) of the internal nodes v. Namely the hash value hv at some internal node v that has left child u and right child w is computed as hv = h(hu ||v||label(v)||hw ) , where h(.) is the used generic collision-resistant hash function (e.g., SHA-2). Lemma 6.1 Algorithm setup() of the authenticated data structure scheme MFD has O(n) access complexity. Moreover, the authenticated data structure auth(D0 ) output by setup() has O(n) group complexity. Proof: First of all, HBD.setup() has O(n) complexity, outputting an authenticated data structure of O(n) group complexity, by Corollary 6.1. Denote now with vA an internal node of tree T , which is the root of a subtree TA . For each leaf of the tree vi the algorithm computes Pi = ai + s and then sets label(vi ) = g Pi . Note now that for every other internal node of the tree vA with left child vB and right child vC , it is PA = PB PC and label(vA ) = g PA , since A = B ∪ C. The described recursive computation has O(n) complexity (a postorder traversal of T ). Moreover, since for each node of T , we are storing one label, the total group complexity of the labels is O(n). This completes the proof. 2 184 Algorithm {Dh+1 , auth(Dh+1 ), dh+1 , upd} ← update(u, Dh , auth(Dh ), dh , sk, pk): We distinguish two cases: (1) Insertion of element x ∈ X : Let a1 = u||z ∈ A be an interval stored in the dictionary such that u < x < z. Then the insertion of x is equivalent with deleting the interval a1 and inserting the intervals b1 = u||x and b2 = x||z; (2) Deletion of element x ∈ X : Let a1 = u||x ∈ A and a2 = x||z ∈ A be successive intervals stored in the dictionary. Then the deletion of x is equivalent with deleting the intervals a1 and a2 and inserting the interval b = u||z. We have therefore reduced the update of elements in X to a constant number of updates of intervals in A. Thus we continue the description of the update algorithms with reference to intervals. Let a be the interval of the update u. Interval a defines a logarithmic number of nodes in T that needs to be accessed and modified in order for the update to be performed (this follows from red-black tree properties). Let p(a) be the set of those nodes. For every node v ∈ p(a), the algorithm updates the labels label(v) and outputs the updated labels as information upd. The algorithm also stores the new (updated) labels on tree T , which is updated to T 0 . Finally, the algorithm calls {A0 , auth(A0 ), d0 } ← HBD.update(u, A, auth(A), dh , sk, pk) and outputs the following structures: 1. The new digest dh+1 , which contains acc(A0 ) = acc(A)a+s (in the case of a deletion it is −1 acc(A0 ) = acc(A)(a+s) ) and the new digest hash(A0 ) = HBD.d0 , as output by calling HBD.update(); 2. The new authenticated data structure auth(Dh+1 ), which contains T 0 and auth(A0 ); 3. Information upd, which contains the updated labels. Lemma 6.2 Algorithm update() of the authenticated data structure scheme MFD has O(log n) access complexity. Moreover, the update information upd output by update() has O(log n) group complexity. 185 Proof: This result follows from the properties of the red-black tree [29]: There is only a logarithmic number of nodes that changes during a red-black tree update. Moreover, the label label(v) of each such node v can be updated with O(1) complexity since update() has access to the secret key s. Finally, by Corollary 6.1, HBD.update() has O(log n) access complexity. This makes the total update complexity equal to O(log n). 2 Algorithm {Dh+1 , auth(Dh+1 ), dh+1 } ← refresh(u, Dh , auth(Dh ), dh , upd, pk): The algorithm updates T to T 0 by using the information contained in upd3 . Finally, the algorithm calls {A0 , auth(A0 ), d0 } ← HBD.refresh(u, A, auth(A), dh , sk, pk) and outputs the following structures: 1. The new digest dh+1 , which contains acc(A0 ) (contained in upd) and the new digest hash(A0 ) = HBD.d0 , as output by calling HBD.refresh(); 2. The new authenticated data structure auth(Dh+1 ), which contains T 0 and auth(A0 ). Lemma 6.3 Algorithm refresh() of the authenticated data structure scheme MFD has O(log n) access complexity. Proof: The algorithm performs a computation proportional to the size of the information upd. Therefore, by Lemma 6.2, this part has O(log n) access complexity. Moreover, by Corollary 6.1, HBD.refresh() has O(log n) access complexity. Summing up, the total access complexity of the algorithm is O(log n). 2 6.3.1 Dictionary queries and verification In this section we show the construction of proofs for the dictionary queries using the authenticated data structure scheme MFD. As it was defined in Section 6.1, a dictionary range search query is described by two arguments, namely a, b ∈ U, with a ≤ b. Moreover, 3 Algorithm refresh could perform this task without access to information upd. However, since the algorithm does not have access to the secret key sk, this would require linear complexity. 186 the answer to the query is the sorted list of w successive elements y = [y1 y2 . . . yw−1 yw ] ⊆ X such that y1 ≤ a ≤ y2 ≤ . . . ≤ yw−1 ≤ b < yw . For reasons to be made clear later, we will distinguish two cases: • If w = Ω(log n), the proof is constructed by using the authenticated data structure scheme HBD. In this case, the size of the answer is Ω(log n), and therefore the logarithmic-sized proofs of HBD (see Corollary 6.1) achieve optimality, according to Definition 2.8; • If w = o(log n), the proof needs to be constructed in a different way, so that optimality can be achieved. Specifically, and as we will see later, optimality will be achieved only for w = O(1). This is where the multilinear forms need to be employed. We continue with describing algorithm query() formally. Algorithm {Π(q), α(q)} ← query(q, Dh , auth(Dh ), pk): Let the query q be two elements a, b ∈ U, with a ≤ b. Suppose the answer to the query is the sorted list of w successive elements y = [y1 y2 . . . yw−1 yw ] ⊆ X such that y1 ≤ a ≤ y2 ≤ . . . ≤ yw−1 ≤ b < yw . If w = Ω(log n), let HBD.Π(q) and HBD.α(q) be the proof and the answer respectively, output by calling HBD.query(q, Dh , auth(Dh ), pk), where q contains intervals y1 ||y2 and yw−1 ||yw (note that y1 ||y2 ≤ yw−1 ||yw ). Then we set Π(q) = HBD.Π(q). Namely in this case, the proof is constructed by using the authenticated data structure scheme HBD. Let us now examine the most interesting case, where w = o(log n). In this case the proof Π(q) consists of w − 1 group elements in G, namely the witnesses Q Wai = e(g, g, . . . , g) a∈A−ai (a+s) ∈G, (6.4) where ai = yi ||yi+1 , for i = 1, . . . w − 1. Note that Wai is similar to the membership witness for accumulators, described in Section 3. 187 Lemma 6.4 Algorithm query() of the authenticated data structure scheme MFD has O(log n+ w) access complexity when w = Ω(log n) and O(w log n) access complexity when w = o(log n). Moreover, in both cases, it outputs a proof Π(q) of O(w) group complexity. Proof: If w = Ω(log n), then the authenticated data structure scheme HBD, by Corollary 6.1, outputs proofs of O(log n+w) group complexity with O(log n+w) access complexity. However, since w = Ω(log n), it is O(log n + w) = O(w). For the case w = o(log n), note that each witness Wai in Relation 6.4 can be constructed with O(log n) access complexity: Let vi0 , vi1 , . . . , vil be the path in tree T from the leaf node storing the interval ai to the root of tree T , vil , where l = O(log n). Let also wi0 , wi1 , . . . , wi(l−1) be the sibling nodes of nodes vi0 , vi1 , . . . , vi(l−1) respectively (note that wi0 might not exist). By the construction of tree T (it can be viewed as segment tree [97]), for each j = 0, 1, . . . , l − 1, it is label(wij ) = g Pij , (6.5) where l−1 Y j=0 Pij = Y (a + s) . (6.6) a∈A−ai This means that information about the whole set can be retrieved by accessing O(log n) memory locations. Therefore, the algorithm constructs the witness Wai by computing e label(wi0 ), label(wi1 ), . . . , label(wi(l−1) ), g, . . . , g  = e g Pi0 , g Pi1 , . . . , g Pi(l−1) , g, . . . , g = e(g, g, . . . , g) = e(g, g, . . . , g) Ql−1 j=0  Pij Q a∈A−ai (a+s) = Wai . The above four equalities follow from Relation 6.5, the properties of the multilinear form e(., . . . , .) and Relations 6.6 and 6.4 respectively. Since computing one such witness Wai requires O(log n) inputs from the authenticated data structure, and since the proof construction requires the computation of w − 1 such witnesses we conclude that for the case 188 w = o(log n), computing the proof has O(w log n) access complexity. Finally, since the proof contains w − 1 such witnesses (which are elements in G), we conclude that the group complexity of the proof in the case w = o(log n) is also O(w). 2 We now formally describe the verification algorithm. The verification algorithm will take as input a proof and an answer and will either accept or reject the answer. Note that the verification algorithm needs to have access to the secret key sk. Therefore the authenticated data structure scheme MFD is not publicly verifiable. Algorithm {accept, reject} ← verify(q, α, Π, dh , sk, pk): Let the query q be two elements a, b ∈ U, with a ≤ b. Suppose the input answer α is the sorted list of w successive elements y = [y1 y2 . . . yw−1 yw ] such that y1 ≤ a ≤ y2 ≤ . . . ≤ yw−1 ≤ b < yw . If w = Ω(log n) the algorithm verifies the answer by running algorithm HBD.verify(q, α, Π, dh , sk, pk). Otherwise, i.e., if w = o(log n), the input proof Π is the list of w − 1 witnesses W1 , W2 , . . . , Ww−1 ∈ G . The algorithm outputs accept if all of the following relations are true: (s+ai ) Wi = e(d, g, . . . , g) for all i = 1, . . . , w − 1 , (6.7) where ai = yi ||yi+1 and d = acc(A) is contained in digest dh . Lemma 6.5 Algorithm verify() of the authenticated data structure scheme MFD has O(w) access complexity. Proof: For w = Ω(log n) the complexity is due to the authenticated data structure scheme HBD and the proof follows the same argument as in Lemma 6.4. For w = o(log n), the complexity is O(w) since the algorithm needs to check Relations 6.7. Checking one such relation requires one exponentiation in G and since w − 1 such relations need to be checked, the result follows. Note that the required exponentiations in Relations 6.7 can be performed by the algorithm because the algorithm has access to the secret key s. 2 189 Lemma 6.6 The authenticated data structure scheme MFD = {genkey, setup, update, refresh, query, verify} is correct according to Definition 2.4. Proof: Let D0 be any dictionary storing the collection of intervals A, corresponding to an elements collection X (of n elements) from a totally-ordered universe U. Fix the security parameter k and output pk = {G, G, e(., . . . , .), p} and sk = s ∈ Z∗p by calling algorithm genkey(). Then output an authenticated data structure auth(D0 ) and the respective digest d0 , by calling algorithm setup(). Pick a polynomial number of updates—namely, pick a polynomial number of elements from U for insertion or deletion—and update auth(D0 ) and d0 by calling algorithm refresh(). Let Dh be the final dictionary, auth(Dh ) be the produced authenticated data structure and dh be the final digest. Let now q be a range search query corresponding to elements a and b from U with a ≤ b. Algorithm query outputs an answer α(q) which is the sorted list of w successive elements y = [y1 y2 . . . yw−1 yw ] ⊆ X (note that [y1 ||y2 y2 ||y3 . . . yw−1 ||yw ] ⊆ A) such that y1 ≤ a ≤ y2 ≤ . . . ≤ yw−1 ≤ b < yw . Let also Π(q) be the proof output by algorithm query(). We distinguish two cases: 1. If w = Ω(log n), the proof Π(q) is computed by algorithm HBD.query(). By the correctness of the authenticated data structure scheme HBD, verify() does not reject in this case; 2. If w = o(log n), the proof Π(q) consists of the witnesses Wai for i = 1, . . . , w − 1, and where ai = yi ||yi+1 . Algorithm verify() does not reject since Waaii +s =  e(g, g, . . . , g) Q Q = e(g, g, . . . , g) a∈A−ai (a+s) (ai +s) a∈A (a+s) = e(acc(A), g, . . . , g) , by the definition of Wai in Relation 6.4 and since acc(A) is always maintained to be the accumulation of all the intervals in A through algorithm refresh()—see Relation 6.1. This completes the proof. 2 190 Lemma 6.7 The authenticated data structure scheme MFD = {genkey, setup, update, refresh, query, verify} is secure according to Definition 2.5. Proof: Fix the security parameter k and output pk = {G, G, e(., . . . , .), p} and sk = s ∈ Z∗p by calling algorithm genkey(). Let Adv be a polynomially-bounded adversary. Adv picks an initial collection of n elements X , all belonging to a totally-ordered universe U. Let A be the respective collection of intervals, stored in a dictionary D0 . Adv outputs an authenticated data structure auth(D0 ), by calling algorithm setup() through oracle access. Then Adv picks a polynomial number of updates—namely, he picks a polynomial number of elements from U for insertion or deletion. Let Dh be the final dictionary after the updates, let the updated final collection of intervals and elements be A and X respectively, and let dh be the final digest as produced by the adversary through oracle access to algorithm update(). Let q be a dictionary query picked by the adversary, consisting of two elements a, b ∈ U, with a ≤ b. Suppose the adversary outputs an incorrect answer α which is however the sorted list of w successive elements [z1 z2 . . . zw−1 zw ] such that z1 ≤ a ≤ z2 ≤ . . . ≤ zw−1 ≤ b < zw and a respective proof Π. We will compute the probability that check(q, α, Dh ) rejects, while verify(q, α, Π, sk, pk) accepts, as required by Definition 2.5. If w = Ω(log n), by the security of the scheme HBD the event in question happens with probability neg(k). If w = o(log n), the proof Π consists of w − 1 witnesses W1 , W2 , . . . , Ww−1 , each one referring to the intervals b1 , b2 , . . . , bw−1 , where bi = zi ||zi+1 , respectively. Since answer α is not correct, it should be the case that there exists bi ∈ / A4 such that Wibi +s = e(g, g, . . . , g)(a0 +s)(a1 +s)(a2 +s)...(an +s) , where A = {a0 , a1 , . . . , an }. Since bi ∈ / {a0 , a1 , a2 , . . . , an } we can write (a0 + s)(a1 + s) . . . (an + s) = P(bi + s) + λ , where the coefficients of polynomial P and quantity λ are computable in polynomial time in 4 Note that bi ∈ / A is equivalent to either adding extra elements in the reported range or omitting certain elements from the reported range. 191 n (polynomial division). Therefore the adversary Adv can compute 1 e(g, g, . . . , g) bi +s = [Wi e(g, g, . . . , g)−P ]λ i −1 , i since e(g, g, . . . , g)s ∈ G can efficiently be computed from g s ∈ G by using the admissible multilinear form e : Gk → G for all i = 0, . . . , q. However, by Assumption 6.1, this happens with probability neg(k). 2 We note here that the second part of the proof of security above (the case w = o(log n)) follows the same logic of the security of the bilinear-map accumulator in Lemma 5.1. However, in a multilinear setting, where e(., . . . , .) is not used for verification, if we are to use only Assumption 6.1 we cannot prove security for subsets of elements, but only for one element. Proving security for subsets of elements in the multilinear setting would require a stronger assumption. Moreover, due to this limitation, we will only be able to prove optimality of the presented authenticated data structure scheme MFD for specific values of w, i.e., for w = O(1) or w = Ω(log n). Lemma 6.8 For range search queries outputting an answer of size w such that w = O(1) or w = Ω(log n), the authenticated data structure scheme MFD = {genkey, setup, update, refresh, query, verify} is optimal according to Definition 2.8. Proof: According to Definition 2.8, the authenticated data structure scheme MFD = {genkey, setup, update, refresh, query, verify} for a dictionary D of n elements is optimal as long as w = O(1) or w = Ω(log n). This is because all of the following are true: 1. The authenticated data structure scheme MFD is correct and secure, by Lemmata 6.6 and 6.7 respectively; 2. For the group complexity of the authenticated data structure we have |auth(D)| = |D| = O(n), by Lemma 6.1; 3. By Lemmata 6.2 and 6.3, and for the update access complexity we have |update()| + |upd| + |refresh()| = O(|update()|) = O(log n); 192 4. For the query access complexity, we distinguish two cases: • When w = O(1) or w = Ω(log n), by Lemma 6.4, we have |query()| = O(|query()|) = O(log n + w) . So in this case optimality is achieved; • When w = ω(1) and w = o(log n), by Lemma 6.4, we have |query()| = O(w log n) and |query()| = O(log n + w). Therefore |query()| is not O(|query()|) which means that the query complexity constraint is not satisfied in this case. 5. For the group complexity of the proof we have |Π(q)| = O(|q| + |α(q)|) = O(w), by Lemma 6.4; 6. For the access complexity of the verification algorithm we have |verify()| = O(|q| + |α(q)|) = O(w). Therefore the authenticated data structure scheme MFD is optimal for range search queries returning an answer of size w such that w = O(1) or w = Ω(log n). 2 6.3.2 Main results Theorem 6.1 Let k be the security parameter and assume the existence of an admissible Θ(k)-multilinear form generator, as defined in Definition 6.2. Then there exists an authenticated data structure scheme MFD = {genkey, setup, update, refresh, query, verify} for a data structure scheme defined for a dynamic dictionary D storing n elements such that: 1. It is correct according to Definition 2.4 and secure according to Definition 2.5 and (i) under the multilinear q-strong Diffie-Hellman assumption; (ii) assuming the existence of generic collision-resistant hash functions; 2. It is optimal only for range search queries outputting an answer of size w such that w = O(1) or w = Ω(log n), according to Definition 2.8; 193 3. It is not publicly-verifiable according to Definition 2.9; 4. The access complexity of setup() is O(n), outputting an authenticated data structure auth(D) of O(n) group complexity; 5. The access complexity of update() is O(log n), outputting update information upd of O(log n) group complexity; 6. The access complexity of refresh() is O(log n); 7. For a range search query q outputting an answer of size w we have: (a) The access complexity of query() is O(log n+w) when w = Ω(log n) and O(w log n) when w = o(log n); (b) The access complexity of verify() is O(w); (c) The group complexity of the proof Π(q) is O(w). Proof: This result follows directly from Lemmata 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7 and 6.7. Note that the scheme is not publicly-verifiable since algorithm verify() requires the secret key as an input. Finally, we have to assume existence of generic collision-resistant hash functions since we are using the authenticated data structure scheme HBD. 2 We now present the final result of this chapter that relates optimality of an authenticated data structure scheme for a dictionary with the existence of admissible multilinear form generators. Theorem 6.2 Let k be the security parameter and let D be a data structure scheme for a dictionary of n elements that supports range search queries q outputting an answer of size w such that w = O(1) or w = Ω(log n). If no optimal authenticated data structure scheme for D exists, then no admissible Θ(k)-multilinear form generator exists either. Proof: Let’s assume this is not the case and an admissible Θ(k)-multilinear form generator does exist in the absence of an optimal authenticated data structure scheme for D. This is 194 a contradiction since we can use the construction of Theorem 6.1, which uses an admissible Θ(k)-multilinear form generator, to derive an optimal authenticated data structure scheme for D. 2 Finally we need to make the following important observation. Theorem 6.2 does not exclude the existence of some instance of a multilinear form, even in the absence of optimal authenticated dictionaries (say for example an instance of a multilinear form for three inputs). The result holds for all admissible Θ(k)-multilinear forms, where k is the security parameter. 6.3.3 Application in the two-party protocol Due to the fact that the authenticated data structure scheme MFD is not publicly-verifiable, it can only be used by a two-party protocol, since a three-party protocol always requires a publicly-verifiable authenticated data structure scheme (see Protocol 2.1). However, in order to be able to use the authenticated data structure scheme MFD of Theorem 6.1 in a blackbox way with Theorem 2.2—and derive a two-party authenticated data structures protocol, we have to ensure that Assumption 2.1 holds for the authenticated data structure scheme MFD: Lemma 6.9 Assumption 2.1 is true for the authenticated data structure scheme MFD. Moreover, for every update u, |Qu | has O(1) complexity. Proof: Let an update u refer to element e, i.e., either insert element e to the dictionary or delete element e from the dictionary. The respective set of queries Qu required for Assumption 2.1 simply contains one query q for the range [e, e0 ] such that there are w = Ω(log n) elements between e and e0 . Let {Π(q), α(q)} ← query(q, Dh , auth(Dh ), pk). Since w = Ω(log n), Π(q) and α(q) are output by algorithm HBD.query(). We now describe function z(.) from Assumption 2.1. Function z(.) extracts δu (Dh ) and δu (auth(Dh )) from Π(q): Due to the hashing scheme employed in Remark 6.1, Π(q) contains all the structure of the red-black tree δu (Dh ) that is accessed during update u, along 195 with the labels label(.) (i.e., the ones that need to be updated by update()), that belong to the accessed authenticated data structure δu (auth(Dh )). Extracting that information has O(log n) complexity, equal to the verification complexity, as required by Assumption 2.1. This completes the proof. 2 By Theorems 2.2 and 6.1 and Lemma 6.9, we can now state the final result for the two-party model: Corollary 6.2 Let k be the security parameter and assume (i) the existence of an admissible Θ(k)-multilinear form generator; (ii) that the multilinear q-strong Diffie-Hellman assumption holds; (iii) the existence of generic collision-resistant hash functions. Then there exists a two-party authenticated data structures protocol (see Protocol 2.2) for verifying range search queries q on a dynamic dictionary storing n elements, and where w is the size of the answer to a range search query q, such that: 1. The protocol is interactive; 2. The setup at the client has O(n) access complexity; 3. The update at the client has O(log n) access complexity; 4. The verification at the client has O(w) access complexity; 5. The space needed at the client has O(1) group complexity; 6. The communication between the client and the server has O(log n) group complexity during updates and O(w) group complexity during queries; 7. The update at the server has O(log n) access complexity; 8. The query at the server has O(log n + w) access complexity when w = Ω(log n) and O(w log n) access complexity when w = o(log n); 9. The space needed at the server has O(n) group complexity; 196 10. For a query q sent by the client to the server at any time (even after updates), let α be an answer and let π be a proof returned by the server. With probability Ω(1 − neg(k)), the client accepts the answer α if and only if α is correct. 6.4 Summary In this chapter, we have presented the first optimal authenticated dictionary (Theorem 6.1) supporting range search queries outputting answers of size w such that w = O(1) or w = Ω(log n), having verification and proof complexity equal to O(w) (as opposed to O(log n+w), see Table 6.1). Its design is based on multilinear forms, a recently-proposed cryptographic primitive [19] whose construction remains an open problem to date. However, since multilinear forms are not known to exist yet, this work can be viewed from a different angle (Theorem 6.2): if one could prove that such optimal authenticated dictionaries cannot exist in the computational model, irrespectively of cryptographic primitives, then our result would imply that certain admissible multilinear form generators cannot exist as well (i.e., it can be viewed as a reduction). Thus, we provide an alternative avenue towards proving the nonexistence of multilinear form generators in the context of general lower bounds for authenticated data structures [106] and for memory checking [35]. Chapter 7 Conclusions This thesis studies the problem of efficiently verifying data and computations stored and performed respectively by untrusted parties. This research direction, lying under the framework of cloud cryptography, has become very relevant nowadays, given the great amount of information and computation that is outsourced remotely at untrusted repositories, due to the increasing adoption of cloud computing in our everyday digital interactions. We therefore explore in depth the field of authenticated data structures, constructing a firm theoretical foundation in Chapter 2 and then continue, in the subsequent chapters, with the development of five different authenticated data structure schemes, each one constructed for a different problem. All our solutions are fully dynamic. A common feature shared by all the authenticated data structures designed in this thesis is the use of advanced cryptography. A common goal of all the solutions has been how to exploit the offered cryptographic tools in order to derive highly-desirable efficiency features, such as constant communication complexity (Chapter 3), parallel algorithms (Chapter 4), operation sensitivity (Chapter 5) and optimality (Chapter 6), which could not be achieved otherwise, e.g., with the use of traditional hash-based techniques. We prove the security of our constructions only under well-accepted—by the cryptography community— computational assumptions (e.g., strong Diffie-Hellman and RSA problems and polynomial approximation of lattices problems). 197 HBD RHT BHT LBT MFD [81] Chapter 3 Chapter 3 Chapter 4 Chapter 6 setup() n n n n log n n update() log n 1 1 1 log n refresh() log n 1 1 log n log n   query() log n n n log n log n log n verify() log n 1 1 log n 1 proof Π(q) log n 1 1 log n 1 info. upd 1 1 1 1 log n publicly verifiable yes yes yes yes no optimal no no no no no assumption Generic CR Strong RSA Bilinear q-DH GAPSVP Multilinear q-DH and Generic CR Table 7.1: Asymptotic access and group complexities of the authenticated data structure schemes presented in this thesis, applied to the fundamental problem of verifying read/write operations on an array of n entries, and compared with the first result on dynamic authenticated data structures by Naor and Nissim [81]. We note that, since all complexities for the plain table data structure are constant, no authenticated data structure scheme presented is optimal. Moreover, based on the recent lower bound for memory checking by Dwork et al. [35], it seems unlikely that such a scheme could be derived. 198 199 The findings of this thesis indicate that understanding and employing advanced cryptographic tools can lead to significant complexity gains in authenticated data structures and more generally in verifiable computations. Perhaps the most persuading justification for the validity of this statement is Chapter 5 itself, where non-trivial computations over outsourced data (set operations) are verified with optimal costs, due to the use of bilinear maps. Moreover this result provides evidence that more complicated functionalities (other than traditional set-membership computations) could be possibly verified efficiently in a public-key setting using authenticated data structures techniques, initiating in this fashion the quest for schemes that apply to other interesting problems (e.g., geometric computations). 7.1 Overview of thesis results and discussion In this thesis, we observed that using different cryptography allows for various complexity trade-offs in authenticated data structures. In Table 7.1, we apply all our schemes (except for the scheme of Chapter 5) in the fundamental problem of verifying read/write operations on an array of n entries. This is a data structure where all our authenticated data structure schemes can be easily employed. Table 7.1 also includes a column referring to the seminal result by Naor and Nissim [81], where an authenticated dictionary was presented, based on 2-3 tree implementation, and with logarithmic complexities. From the results in Table 7.1, we draw the following conclusion: As of now, there is no optimal authenticated data structure (as defined in Definition 2.8) for the simplest functionality of reading and writing entries on a table (similar to the memory checking model). We note however, that this does not come as a surprise: It would seem that deriving an optimal authenticated data structure scheme for a table (a RAM array) would violate existing Ω(log n/ log log n) bounds that have appeared in the memory checking model [35]. This observation naturally raises an open problem: Can we design an authenticated table of Θ(log n/ log log n) complexities? This construction would potentially yield an optimal online 200 memory checker and could be derived from the realm of more advanced cryptography. 7.2 Future work It is our belief that providing security in the cloud is going to play a major role in adopting cloud computing as a new computing discipline. Concerning cloud integrity, future work includes a further investigation of the field of authenticated data structures. More specifically, one can focus on the verification of outsourced computations, in an operation-sensitive way. Operation-sensitivity, a crucial efficiency property, has only been achieved so far in a practical and publicly-verifiable fashion for specific computations, such as range search [52] and set operations (see Chapter 5). On the other hand, it has been shown that in a privatelyverifiable way—and under certain assumptions, it is feasible for general computations, i.e., any boolean circuit, e.g., by using the model of outsourced verifiable computation [41]. However, these constructions are currently not very efficient. Aiming at publicly-verifiable solutions that could be used by cloud applications without changing the user experience, the question that arises is evident: Which outsourced computations (e.g., shortest paths) can be practically and publicly verified in an operation-sensitive way? Another aspect of cloud security that can be investigated is cloud privacy, i.e., protecting the confidentiality of data that is stored remotely. Resorting to a solution that merely encrypts our data before uploading it online defeats one of the main purposes of investing into cloud infrastructures: No advanced meaningful outsourced computations can be performed on encrypted data. Achieving both goals, namely storing encrypted data and at the same time being able to do significant processing with it, was recently achieved with the proposal of a fully-homomorphic encryption scheme [43]. Implementing however such a primitive has not led to efficient solutions yet—it has on the other hand ignited a lot of enthusiasm for cloud privacy research. Our belief is that we have to settle for simpler and more efficient constructions that refer to specific functionalities, e.g., see the work on searchable symmetric 201 encryption by Curtmola et al. [31]. Therefore, future directions could explore the computation of such specific functionalities (e.g., geometric queries, polynomial evaluation) on private data in an efficient way which will allow easy implementation and fast deployment. Finally, another very interesting privacy topic that has emerged lately and lies at the intersection of algorithms and cryptography is the notion of data-oblivious algorithms [49, 109]. Data oblivious algorithms do not perform any data-dependent operations and therefore an adversary observing the flow of the circuit computation cannot distinguish between two different inputs. Applying oblivious algorithms in secure two-party computations can lead to considerable efficiency gains and practical protocols. This is because garbled circuits [113] can be used in this way only for primitive black boxes performing data-dependent operations (e.g., min, max), and not for the whole circuit. Recent results include highly efficient protocols for secure two-party sorting, selection, and permuting [49] as well as for various geometric problems [36]. Since the need for efficient secure two-party computation is now greater than ever, my belief is that there is a lot of research potential on transforming algorithms into oblivious algorithms so that they can be securely used by cloud applications. More theoretically-oriented future research, and as mentioned in Section 7.1, can involve improving the asymptotic bounds of memory checking [35] by using advanced cryptographic primitives, exploring existence and limitations of optimal authenticated data structures, and studying the dynamization overhead of cloud cryptography1 . From a practical perspective, designing more efficient authenticated data structures to be used in practice is definitely a big challenge: Most practical applications nowadays extensively use fast authenticated data structures such as Merkle trees, the security of which is however based on totally empirical assumptions (e.g., collision-resistance of SHA-2). This provides great efficiency at the cost of risking the security of the application—e.g., SHA-2 replaced MD-5 due to an attack [103], after MD-5 had been used extensively over the years 1 So far, most cloud cryptography constructions work for static data only and updates can be handled in a secure way only through total recomputation, which is highly inefficient. 202 in many systems, such as authenticated file systems and authenticated storage systems. It would be great to come up with authenticated data structures, whose security will be based on a widely-accepted computational assumption (e.g., discrete log) and at the same time can favorably compete in practice with the widely-used Merkle trees. Bibliography [1] PBC: The pairing-based cryptography library. http://crypto.stanford.edu/pbc/. [2] T-mobile sidekick disaster: Danger’s servers crashed, and they don’t have a backup. http://techcrunch.com/2009/10/10/. [3] Miklós Ajtai. Generating hard instances of lattice problems (extended abstract). In Proc. Symposium on Theory of Computing (STOC), pages 99–108, 1996. [4] Aris Anagnostopoulos, Michael T. Goodrich, and Roberto Tamassia. Persistent authenticated dictionaries and their applications. In Proc. Information Security Conference (ISC), pages 379–393, 2001. [5] Benny Applebaum, Yuval Ishai, and Eyal Kushilevitz. From secrecy to soundness: Efficient verification via secure computation. In Proc. International Colloquium on Automata, Languages and Programming (ICALP), pages 152–163, 2010. [6] Mikhail J. Atallah, YounSun Cho, and Ashish Kundu. Efficient data authentication in an environment of untrusted third-party distributors. In Proc. International Conference on Data Engineering (ICDE), pages 696–704, 2008. [7] Giuseppe Ateniese, Randal Burns, Reza Curtmola, Joseph Herring, Lea Kissner, Zachary Peterson, and Dawn Song. Provable data possession at untrusted stores. 203 204 In Proc. International Conference on Computer and Communications Security (CCS), pages 598–609, 2007. [8] Man Ho Au, Patrick P. Tsang, Willy Susilo, and Yi Mu. Dynamic universal accumulators for DDH groups and their application to attribute-based anonymous credential systems. In Proc. Cryptographers’ Track at the RSA Conference (CT-RSA), pages 295–308, 2009. [9] Ricardo A. Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval: The Concepts and Technology behind Search. Addison-Wesley, 2nd edition, 2010. [10] Niko Baric and Birgit Pfitzmann. Collision-free accumulators and fail-stop signature schemes without trees. In Proc. Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 480–494, 1997. [11] Mihir Bellare and Daniele Micciancio. A new paradigm for collision-free hashing: Incrementality at reduced cost. In Proc. Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 163–192, 1997. [12] Mihir Bellare and Phillip Rogaway. Random oracles are practical: A paradigm for designing efficient protocols. In Proc. International Conference on Computer and Communications Security (CCS), pages 62–73, 1993. [13] Josh Benaloh and Michael de Mare. One-way accumulators: A decentralized alternative to digital signatures. In Proc. Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 274–285, 1993. [14] Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13:422–426, 1970. [15] Manuel Blum, William S. Evans, Peter Gemmell, Sampath Kannan, and Moni Naor. Checking the correctness of memories. Algorithmica, 12(2/3):225–244, 1994. 205 [16] Dan Boneh and Xavier Boyen. Short signatures without random oracles and the SDH assumption in bilinear groups. Journal of Cryptology, 21(2):149–177, 2008. [17] Dan Boneh and Matthew K. Franklin. Identity-based encryption from the weil pairing. In Proc. International Cryptology Conference (CRYPTO), pages 213–229, 2001. [18] Dan Boneh, Ilya Mironov, and Victor Shoup. A secure signature scheme from bilinear maps. In Proc. Cryptographers’ Track at the RSA Conference (CT-RSA), pages 98– 110, 2003. [19] Dan Boneh and Alice Silverberg. Applications of multilinear forms to cryptography. Contemporary Mathematics, 324(1):71–90, 2003. [20] Dan Boneh and Brent Waters. Conjunctive, subset, and range queries on encrypted data. In Proc. Theoretical Cryptography Conference (TCC), pages 535–554, 2007. [21] Andrei Z. Broder and Michael Mitzenmacher. Network applications of Bloom filters: A survey. Internet Mathematics, 1(4):485–509, 2005. [22] Jan Camenisch, Markulf Kohlweiss, and Claudio Soriente. An accumulator based on bilinear maps and efficient revocation for anonymous credentials. In Proc. Public Key Cryptography (PKC), pages 481–500, 2009. [23] Jan Camenisch and Anna Lysyanskaya. Dynamic accumulators and application to efficient revocation of anonymous credentials. In Proc. International Cryptology Conference (CRYPTO), pages 61–76, 2002. [24] Jan Camenisch and Anna Lysyanskaya. A signature scheme with efficient protocols. In Proc. Security and Cryptography for Networks (SCN), pages 268–289, 2002. [25] Sébastien Canard and Aline Gouget. Multiple denominations in e-cash with compact transaction data. In Proc. Financial Cryptography (FC), pages 82–97, 2010. 206 [26] Larry Carter and Mark N. Wegman. Universal classes of hash functions. In Proc. Symposium on Theory of Computing (STOC), pages 106–112, 1977. [27] Jung Hee Cheon and Dong Hoon Lee. A note on self-bilinear maps. Korean Mathematical Society, 46(2):303–309, 2009. [28] Kai-Min Chung, Yael Kalai, and Salil Vadhan. Improved delegation of computation using fully homomorphic encryption. In Proc. International Cryptology Conference (CRYPTO), pages 483–501, 2010. [29] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press, 3rd edition, 2009. [30] Scott A. Crosby. Efficient Tamper-Evident Data Structures for Untrusted Servers. PhD thesis, Rice University, May 2010. [31] Reza Curtmola, Juan A. Garay, Seny Kamara, and Rafail Ostrovsky. Searchable symmetric encryption: improved definitions and efficient constructions. In Proc. International Conference on Computer and Communications Security (CCS), pages 79–88, 2006. [32] Ivan Damgård and Nikos Triandopoulos. Supporting non-membership proofs with bilinear-map accumulators. http://eprint.iacr.org/. Cryptology ePrint Archive, Report 2008/538, 2008. [33] Premkumar Devanbu, Michael Gertz, Chip Martel, and Stuart G. Stubblebine. Authentic third-party data publication. In Proc. Conference on Database Security (DBSEC), pages 101–112, 2000. [34] Martin Dietzfelbinger, Anna Karlin, Kurt Mehlhorn, Friedhelm Meyer auf der Heide, Hans Rohnert, and Robert E. Tarjan. Dynamic perfect hashing: Upper and lower bounds. SIAM Journal on Computing, 23(4):738–761, 1994. 207 [35] Cynthia Dwork, Moni Naor, Guy Rothblum, and Vinod Vaikuntanathan. How efficient can memory checking be? In Proc. Theoretical Cryptography Conference (TCC), pages 503–520, 2009. [36] David Eppstein, Michael T. Goodrich, and Roberto Tamassia. Privacy-preserving dataoblivious geometric algorithms for geographic data. In Proc. International Symposium on Advances in Geographic Information Systems (GIS), pages 13–22, 2010. [37] C. Chris Erway, Alptekin Küpçü, Charalampos Papamanthou, and Roberto Tamassia. Dynamic provable data possession. In Proc. International Conference on Computer and Communications Security (CCS), pages 213–222, 2009. [38] Li Fan, Pei Cao, Jussara Almeida, and Andrei Z. Broder. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Networking, 8(3):281–293, 2000. [39] Michael J. Freedman, Kobbi Nissim, and Benny Pinkas. Efficient private matching and set intersection. In Proc. Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 1–19, 2004. [40] Joachim Von Zur Gathen and Jurgen Gerhard. Modern Computer Algebra. Cambridge University Press, 2nd edition, 2003. [41] Rosario Gennaro, Craig Gentry, and Bryan Parno. Non-interactive verifiable computing: Outsourcing computation to untrusted workers. In Proc. International Cryptology Conference (CRYPTO), pages 465–482, 2010. [42] Rosario Gennaro, Shai Halevi, and Tal Rabin. Secure hash-and-sign signatures without the random oracle. In Proc. Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 123–139, 1999. 208 [43] Craig Gentry. Fully homomorphic encryption using ideal lattices. In Proc. Symposium on Theory of Computing (STOC), pages 169–178, 2009. [44] Oded Goldreich, Shafi Goldwasser, and Shai Halevi. Collision-free hashing from lattice problems. In Electronic Colloqium on Computational Complexity (ECCC), 3(56), 1996. [45] Oded Goldreich, Silvio Micali, and Avi Wigderson. Proofs that yield nothing but their validity for all languages in NP have zero-knowledge proof systems. Journal of the ACM, 38(3):691–729, 1991. [46] Michael T. Goodrich, Charalampos Papamanthou, Roberto Tamassia, and Nikos Triandopoulos. Athos: Efficient authentication of outsourced file systems. In Proc. Information Security Conference (ISC), pages 80–96, 2008. [47] Michael T. Goodrich and Roberto Tamassia. Algorithm Design: Foundations, Analysis, and Internet Examples. John Wiley & Sons, 2002. [48] Michael T. Goodrich, Roberto Tamassia, and Andrew Schwerin. Implementation of an authenticated dictionary with skip lists and commutative hashing. In Proc. DARPA Information Survivability Conference and Exposition II (DISCEX II), pages 68–82, 2001. [49] Michael T. Goodrich. Randomized Shellsort: A simple oblivious sorting algorithm. In Proc. Symposium on Discrete Algorithms (SODA), pages 1–16, 2010. [50] Michael T. Goodrich, Charalampos Papamanthou, and Roberto Tamassia. On the cost of persistence and authentication in skip lists. In Proc. Workshop on Experimental Algorithms (WEA), pages 94–107, 2007. [51] Michael T. Goodrich, Roberto Tamassia, and Jasminka Hasic. An efficient dynamic and distributed cryptographic accumulator. In Proc. Information Security Conference (ISC), pages 372–388, 2002. 209 [52] Michael T. Goodrich, Roberto Tamassia, and Nikos Triandopoulos. Super-efficient verification of dynamic outsourced databases. In Proc. Cryptographers’ Track at the RSA Conference (CT-RSA), pages 407–424, 2008. [53] Michael T. Goodrich, Roberto Tamassia, and Nikos Triandopoulos. Efficient authenticated data structures for graph connectivity and geometric search problems. Algorithmica, 60(3):505–552, 2011. [54] Eric Hall and Charanjit S. Julta. Parallelizable authentication trees. In Proc. Selected Areas in Cryptography (SAC), pages 95–109, 2005. [55] Brian Hayes. Cloud computing. Communications of the ACM, 51(7):9–11, 2008. [56] Alexander Heitzmann, Bernardo Palazzi, Charalampos Papamanthou, and Roberto Tamassia. Efficient integrity checking of untrusted network storage. In Proc. International Workshop on Storage Security and Survivability (STORAGESS), pages 43–54, 2008. [57] Jeffrey Hoffstein, Nick Howgrave-Graham, Jill Pipher, Joseph H. Silverman, and William Whyte. NTRUSIGN: Digital signatures using the NTRU lattice. In Proc. Cryptographers’ Track at the RSA Conference (CT-RSA), pages 122–140, 2003. [58] Andreas Hutflesz, Hans-Werner Six, and Peter Widmayer. Globally order preserving multidimensional linear hashing. In Proc. International Conference on Data Engineering (ICDE), pages 572–579, 1988. [59] Joseph F. JaJa. An Introduction to Parallel Algorithms. Addison-Wesley, 1992. [60] Antoine Joux. A one-round protocol for tripartite Diffie-Hellman. Journal of Cryptology, 17(4):263–276, 2004. [61] Jonathan Katz and Yehuda Lindell. Introduction to Modern Cryptography. Chapman & Hall/CRC, 2007. 210 [62] Claire Kenyon and Jeffrey S. Vitter. Maximum queue size and hashing with lazy deletion. Algorithmica, 6:597–619, 1991. [63] Dieter Kratsch, Ross M. McConnell, Kurt Mehlhorn, and Jeremy P. Spinrad. Certifying algorithms for recognizing interval graphs and permutation graphs. In Proc. Symposium on Discrete Algorithms (SODA), pages 158–167, 2003. [64] Hyung-Mok Lee, Kyung Ju Ha, and Kyo-Min Ku. ID-based multi-party authenticated key agreement protocols from multilinear forms. In Proc. Information Security Conference (ISC), pages 104–117, 2005. [65] Arjen K. Lenstra, Hendrik W. Lenstra Jr, and László Lovász. Factoring polynomials with rational coefficients. Mathematische Annalen, (261):515–534, 1982. [66] Feifei Li, Marios Hadjieleftheriou, George Kollios, and Leonid Reyzin. Dynamic authenticated index structures for outsourced databases. In Proc. International Conference on Management of Data (SIGMOD), pages 121–132, 2006. [67] Feifei Li, Ke Yi, Marios Hadjieleftheriou, and George Kollios. Proof-infused streams: Enabling authentication of sliding window queries on streams. In Proc. Very Large Data Bases (VLDB), pages 147–158, 2007. [68] Jiangtao Li, Ninghui Li, and Rui Xue. Universal accumulators with efficient nonmembership proofs. In Proc. Applied Cryptography and Network Security (ACNS), pages 253–269, 2007. [69] Nathan Linial and Ori Sasson. Non-expansive hashing. In Proc. Symposium on Theory of Computing (STOC), pages 509–517, 1996. [70] Ben Lynn. On the Implementation of Pairing-Based Cryptosystems. PhD thesis, Stanford University, November 2008. 211 [71] Vadim Lyubashevsky and Daniele Micciancio. Generalized compact knapsacks are collision resistant. In Proc. International Colloquium on Automata, Languages and Programming (ICALP), pages 144–155, 2006. [72] Kyriakos Mouratidis, Man Lung Yiu and Yimin Lin. Efficient verification of shortest path search via authenticated hints. In Proc. International Conference on Data Engineering (ICDE), pages 237–248, 2010. [73] Petros Maniatis. Historic Integrity in Distributed Systems. PhD thesis, Stanford University, August 2003. [74] Petros Maniatis and Mary Baker. Enabling the archival storage of signed documents. In Proc. USENIX Conference on File and Storage Technologies (FAST), pages 31–45, 2002. [75] Charles U. Martel, Glen Nuckolls, Premkumar T. Devanbu, Michael Gertz, April Kwong, Stuart G. Stubblebine. A general model for authenticated data structures. Algorithmica, 39(1):21–41, 2004. [76] Alfred Menezes, Scott Vanstone, and Tatsuaki Okamoto. Reducing elliptic curve logarithms to logarithms in a finite field. In Proc. Symposium on Theory of Computing (STOC), pages 80–89, 1991. [77] Ralph C. Merkle. A certified digital signature. In Proc. International Cryptology Conference (CRYPTO), pages 218–238, 1989. [78] Daniele Micciancio and Oded Regev. Worst-case to average-case reductions based on gaussian measures. SIAM Journal on Computing, 37(1):267–302, 2007. [79] Ruggero Morselli, Samrat Bhattacharjee, Jonathan Katz, and Peter J. Keleher. Trustpreserving set operations. In Proc. Conference on Computer Communications (INFOCOM), 2004. 212 [80] James K. Mullin. Spiral storage: Efficient dynamic hashing with constant-performance. Computer Journal, 28:330–334, 1985. [81] Moni Naor and Kobbi Nissim. Certificate revocation and certificate update. In Proc. USENIX Security Symposium (USENIX), pages 217–228, 1998. [82] Moni Naor and Guy Rothblum. The complexity of online memory checking. Journal of the ACM, 56(1), 2009. [83] Lan Nguyen. Accumulators from bilinear pairings and applications. In Proc. Cryptographers’ Track at the RSA Conference (CT-RSA), pages 275–292, 2005. [84] Glen Nuckolls. Verified query results from hybrid authentication trees. In Proc. Conference on Database Security (DBSEC), pages 84–98, 2005. [85] National Institute of Standards and Technology. Secure hash standard (SHS). October 2008. [86] Rafail Ostrovsky. Efficient computation on oblivious RAMs. In Proc. Symposium on Theory of Computing (STOC), pages 514–523, 1990. [87] Mark H. Overmars. The Design of Dynamic Data Structures. Springer-Verlag, LNCS 156, 1983. [88] HweeHwa Pang and Kyriakos Mouratidis. Authenticating the query results of text search engines. VLDB Endowment, 1(1):126–137, 2008. [89] HweeHwa Pang and Kian-Lee Tan. Authenticating query results in edge computing. In Proc. International Conference on Data Engineering (ICDE), pages 560–571, 2004. [90] Charalampos Papamanthou, Roberto Tamassia, and Nikos Triandopoulos. Authenticated hash tables. In Proc. International Conference on Computer and Communications Security (CCS), pages 437–448, 2008. 213 [91] Charalampos Papamanthou, Roberto Tamassia, and Nikos Triandopoulos. Optimal authenticated data structures with multilinear forms. In Proc. International Conference on Pairing-Based Cryptography (PAIRING), pages 246–264, 2010. [92] Charalampos Papamanthou and Roberto Tamassia. Time and space efficient algorithms for two-party authenticated data structures. In Proc. International Conference on Information and Communications Security (ICICS), pages 1–15, 2007. [93] Charalampos Papamanthou and Roberto Tamassia. Cryptography for efficiency: Authenticated data structures based on lattices and parallel online memory checking. http://eprint.iacr.org/. Cryptology ePrint Archive, Report 2011/102, 2011. [94] Charalampos Papamanthou, Roberto Tamassia, and Nikos Triandopoulos. Optimal verification of operations on dynamic sets. In Proc. International Cryptology Conference (CRYPTO), 2011. [95] Chris Peikert. Public-key cryptosystems from the worst-case shortest vector problem (extended abstract). In Proc. Symposium on Theory of Computing (STOC), pages 333–342, 2009. [96] Franco P. Preparata and Dilip V. Sarwate. Computational complexity of Fourier transforms over finite fields. Mathematics of Computation, 31(139):740–751, 1977. [97] Franco P. Preparata and Michael I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, 1985. [98] Oded Regev. Lattice-based cryptography. In Proc. International Cryptology Conference (CRYPTO), pages 131–141, 2006. [99] Oded Regev. On the complexity of lattice problems with polynomial approximation factors. The LLL algorithm, pages 475–496, 2010. 214 [100] Tomas Sander. Efficient accumulators without trapdoor (extended abstract). In Proc. International Conference on Information and Communications Security (ICICS), pages 252–262, 1999. [101] Tomas Sander, Amnon Ta-Shma, and Moti Yung. Blind, auditable membership proofs. In Proc. Financial Cryptography (FC), pages 53–71, 2001. [102] Victor Shoup. A Computational Introduction to Number Theory and Algebra. Cambridge University Press, 2nd edition, 2008. [103] Marc Stevens, Alexander Sotirov, Jacob Appelbaum, Arjen Lenstra, David Molnar, Dag Arne Osvik, and Benne Weger. Short chosen-prefix collisions for MD5 and the creation of a rogue CA certificate. In Proc. International Cryptology Conference (CRYPTO), pages 55–69, 2009. [104] Roberto Tamassia and Nikos Triandopoulos. Efficient content authentication in peerto-peer networks. In Proc. Applied Cryptography and Network Security (ACNS), pages 354–372, 2007. [105] Roberto Tamassia. Authenticated data structures. In Proc. European Symposium on Algorithms (ESA), pages 2–5, 2003. [106] Roberto Tamassia and Nikos Triandopoulos. Computational bounds on hierarchical data processing with applications to information security. In Proc. International Colloquium on Automata, Languages and Programming (ICALP), pages 153–165, 2005. [107] Roberto Tamassia and Nikos Triandopoulos. Certification and authentication of data structures. In Proc. Alberto Mendelzon Workshop on Foundations of Data Management, 2010. [108] Nikos Triandopoulos. Efficient Data Authentication. PhD thesis, Brown University, September 2006. 215 [109] Guan Wang, Tongbo Luo, Michael T. Goodrich, Wenliang Du, and Zutao Zhu. Bureaucratic protocols for secure two-party sorting, selection, and permuting. In Proc. Symposium on Information, Computer and Communications Security (ASIACCS), pages 226–237, 2010. [110] Peishun Wang, Huaxiong Wang, and Josef Pieprzyk. A new dynamic accumulator for batch updates. In Proc. International Conference on Information and Communications Security (ICICS), pages 98–112, 2007. [111] Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu. Finding collisions in the full SHA-1. In Proc. International Cryptology Conference (CRYPTO), pages 17–36, 2005. [112] Yin Yang, Dimitris Papadias, Stavros Papadopoulos, and Panos Kalnis. Authenticated join processing in outsourced databases. In Proc. International Conference on Management of Data (SIGMOD), pages 5–18, 2009. [113] Andrew Chi-Chih Yao. Protocols for secure computations (extended abstract). In Proc. Foundations of Computer Science (FOCS), pages 160–164, 1982.