Privacy Preserving
Privacy Preserving
Privacy Preserving
for publication in the IEEE INFOCOM 2010 proceedings This paper was presented as part of the main Technical Program at IEEE INFOCOM 2010.
Wenjing Lou
Department of ECE Worcester Polytechnic Institute Email: wjlou@ece.wpi.edu
AbstractCloud Computing is the long dreamed vision of computing as a utility, where users can remotely store their data into the cloud so as to enjoy the on-demand high quality applications and services from a shared pool of congurable computing resources. By data outsourcing, users can be relieved from the burden of local data storage and maintenance. However, the fact that users no longer have physical possession of the possibly large size of outsourced data makes the data integrity protection in Cloud Computing a very challenging and potentially formidable task, especially for users with constrained computing resources and capabilities. Thus, enabling public auditability for cloud data storage security is of critical importance so that users can resort to an external audit party to check the integrity of outsourced data when needed. To securely introduce an effective third party auditor (TPA), the following two fundamental requirements have to be met: 1) TPA should be able to efciently audit the cloud data storage without demanding the local copy of data, and introduce no additional on-line burden to the cloud user; 2) The third party auditing process should bring in no new vulnerabilities towards user data privacy. In this paper, we utilize and uniquely combine the public key based homomorphic authenticator with random masking to achieve the privacy-preserving public cloud data auditing system, which meets all above requirements. To support efcient handling of multiple auditing tasks, we further explore the technique of bilinear aggregate signature to extend our main result into a multi-user setting, where TPA can perform multiple auditing tasks simultaneously. Extensive security and performance analysis shows the proposed schemes are provably secure and highly efcient.
I. I NTRODUCTION Cloud Computing has been envisioned as the nextgeneration architecture of IT enterprise, due to its long list of unprecedented advantages in the IT history: on-demand self-service, ubiquitous network access, location independent resource pooling, rapid resource elasticity, usage-based pricing and transference of risk [1]. As a disruptive technology with profound implications, Cloud Computing is transforming the very nature of how businesses use information technology. One fundamental aspect of this paradigm shifting is that data is being centralized or outsourced into the Cloud. From users perspective, including both individuals and IT enterprises, storing data remotely into the cloud in a exible on-demand manner brings appealing benets: relief of the burden for storage management, universal data access with independent geographical locations, and avoidance of capital expenditure on hardware, software, and personnel maintenances, etc [2].
While Cloud Computing makes these advantages more appealing than ever, it also brings new and challenging security threats towards users outsourced data. Since cloud service providers (CSP) are separate administrative entities, data outsourcing is actually relinquishing users ultimate control over the fate of their data. As a result, the correctness of the data in the cloud is being put at risk due to the following reasons. First of all, although the infrastructures under the cloud are much more powerful and reliable than personal computing devices, they are still facing the broad range of both internal and external threats for data integrity. Examples of outages and security breaches of noteworthy cloud services appear from time to time [3][5]. Secondly, for the benets of their own, there do exist various motivations for cloud service providers to behave unfaithfully towards the cloud users regarding the status of their outsourced data. Examples include cloud service providers, for monetary reasons, reclaiming storage by discarding data that has not been or is rarely accessed, or even hiding data loss incidents so as to maintain a reputation [6][8]. In short, although outsourcing data into the cloud is economically attractive for the cost and complexity of long-term large-scale data storage, it does not offer any guarantee on data integrity and availability. This problem, if not properly addressed, may impede the successful deployment of the cloud architecture. As users no longer physically possess the storage of their data, traditional cryptographic primitives for the purpose of data security protection can not be directly adopted. Thus, how to efciently verify the correctness of outsourced cloud data without the local copy of data les becomes a big challenge for data storage security in Cloud Computing. Note that simply downloading the data for its integrity verication is not a practical solution due to the expensiveness in I/O cost and transmitting the le across the network. Besides, it is often insufcient to detect the data corruption when accessing the data, as it might be too late for recover the data loss or damage. Considering the large size of the outsourced data and the users constrained resource capability, the ability to audit the correctness of the data in a cloud environment can be formidable and expensive for the cloud users [8], [9]. Therefore, to fully ensure the data security and save the cloud users computation resources, it is of critical importance to enable public auditability for cloud data storage so that the users may resort to a third party auditor (TPA), who has
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE INFOCOM 2010 proceedings This paper was presented as part of the main Technical Program at IEEE INFOCOM 2010.
expertise and capabilities that the users do not, to audit the outsourced data when needed. Based on the audit result, TPA could release an audit report, which would not only help users to evaluate the risk of their subscribed cloud data services, but also be benecial for the cloud service provider to improve their cloud based service platform [7]. In a word, enabling public risk auditing protocols will play an important role for this nascent cloud economy to become fully established, where users will need ways to assess risk and gain trust in Cloud. Recently, the notion of public auditability has been proposed in the context of ensuring remotely stored data integrity under different systems and security models [6], [8], [10], [11]. Public auditability allows an external party, in addition to the user himself, to verify the correctness of remotely stored data. However, most of these schemes [6], [8], [10] do not support the privacy protection of users data against external auditors, i.e., they may potentially reveal user data information to the auditors, as will be discussed in Section III-C. This severe drawback greatly affects the security of these protocols in Cloud Computing. From the perspective of protecting data privacy, the users, who own the data and rely on TPA just for the storage security of their data, do not want this auditing process introducing new vulnerabilities of unauthorized information leakage towards their data security [12]. Moreover, there are legal regulations, such as the US Health Insurance Portability and Accountability Act (HIPAA) [13], further demanding the outsourced data not to be leaked to external parties [7]. Exploiting data encryption before outsourcing [11] is one way to mitigate this privacy concern, but it is only complementary to the privacy-preserving public auditing scheme to be proposed in this paper. Without a properly designed auditing protocol, encryption itself can not prevent data from owing away towards external parties during the auditing process. Thus, it does not completely solve the problem of protecting data privacy but just reduces it to the one of managing the encryption keys. Unauthorized data leakage still remains a problem due to the potential exposure of encryption keys. Therefore, how to enable a privacy-preserving third-party auditing protocol, independent to data encryption, is the problem we are going to tackle in this paper. Our work is among the rst few ones to support privacy-preserving public auditing in Cloud Computing, with a focus on data storage. Besides, with the prevalence of Cloud Computing, a foreseeable increase of auditing tasks from different users may be delegated to TPA. As the individual auditing of these growing tasks can be tedious and cumbersome, a natural demand is then how to enable TPA to efciently perform the multiple auditing tasks in a batch manner, i.e., simultaneously. To address these problems, our work utilizes the technique of public key based homomorphic authenticator [6], [8], [10], which enables TPA to perform the auditing without demanding the local copy of data and thus drastically reduces the communication and computation overhead as compared to the straightforward data auditing approaches. By integrating the homomorphic authenticator with random masking, our protocol guarantees that TPA could not learn any knowledge about the data content
stored in the cloud server during the efcient auditing process. The aggregation and algebraic properties of the authenticator further benet our design for the batch auditing. Specically, our contribution in this work can be summarized as the following three aspects: 1) We motivate the public auditing system of data storage security in Cloud Computing and provide a privacy-preserving auditing protocol, i.e., our scheme supports an external auditor to audit users outsourced data in the cloud without learning knowledge on the data content. 2) To the best of our knowledge, our scheme is the rst to support scalable and efcient public auditing in the Cloud Computing. In particular, our scheme achieves batch auditing where multiple delegated auditing tasks from different users can be performed simultaneously by the TPA. 3) We prove the security and justify the performance of our proposed schemes through concrete experiments and comparisons with the state-of-the-art. The rest of the paper is organized as follows. Section II introduces the system and threat model, our design goals, notations and preliminaries. Then we provide the detailed description of our scheme in Section III. Section IV gives the security analysis and performance evaluation, followed by Section V which overviews the related work. Finally, Section VI gives the concluding remark of the whole paper. II. P ROBLEM S TATEMENT A. The System and Threat Model We consider a cloud data storage service involving three different entities, as illustrated in Fig. 1: the cloud user (U), who has large amount of data les to be stored in the cloud; the cloud server (CS), which is managed by cloud service provider (CSP) to provide data storage service and has signicant storage space and computation resources (we will not differentiate CS and CSP hereafter.); the third party auditor (TPA), who has expertise and capabilities that cloud users do not have and is trusted to assess the cloud storage service security on behalf of the user upon request. Users rely on the CS for cloud data storage and maintenance. They may also dynamically interact with the CS to access and update their stored data for various application purposes. The users may resort to TPA for ensuring the storage security of their outsourced data, while hoping to keep their data private from TPA. We consider the existence of a semi-trusted CS as [14] does. Namely, in most of time it behaves properly and does not deviate from the prescribed protocol execution. However, during providing the cloud data storage based services, for their own benets the CS might neglect to keep or deliberately delete rarely accessed data les which belong to ordinary cloud users. Moreover, the CS may decide to hide the data corruptions caused by server hacks or Byzantine failures to maintain reputation. We assume the TPA, who is in the business of auditing, is reliable and independent, and thus has no incentive to collude with either the CS or the users during the auditing process. TPA should be able to
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE INFOCOM 2010 proceedings This paper was presented as part of the main Technical Program at IEEE INFOCOM 2010.
Cloud Servers
Users
Data Flow
ssage Flow
Security Me
Bilinear Map Let G1 , G2 and GT be multiplicative cyclic groups of prime order p. Let g1 and g2 be generators of G1 and G2 , respectively. A bilinear map is a map e : G1 G2 GT with the following properties [15]: 1) Computable: there exists an efciently computable algorithm for computing e; 2) Bilinear: for all u G1 , v G2 and a, b Zp , e(ua , v b ) = e(u, v)ab ; 3) Non-degenerate: e(g1 , g2 ) = 1; 4) for any u1 , u2 G1 , v G2 , e(u1 u2 , v) = e(u1 , v) e(u2 , v). III. T HE P ROPOSED S CHEMES In the introduction we motivated the public auditability with achieving economies of scale for cloud computing. This section presents our public auditing scheme for cloud data storage security. We start from the overview of our public auditing system and discuss two straightforward schemes and their demerits. Then we present our main result for privacypreserving public auditing to achieve the aforementioned design goals. Finally, we show how to extent our main scheme to support batch auditing for TPA upon delegations from multiusers. A. Denitions and Framework of Public Auditing System We follow the similar denition of previously proposed schemes in the context of remote data integrity checking [6], [10], [11] and adapt the framework for our privacy-preserving public auditing system. A public auditing scheme consists of four algorithms (KeyGen, SigGen, GenProof, VerifyProof). KeyGen is a key generation algorithm that is run by the user to setup the scheme. SigGen is used by the user to generate verication metadata, which may consist of MAC, signatures, or other related information that will be used for auditing. GenProof is run by the cloud server to generate a proof of data storage correctness, while VerifyProof is run by the TPA to audit the proof from the cloud server. Our public auditing system can be constructed from the above auditing scheme in two phases, Setup and Audit: Setup: The user initializes the public and secret parameters of the system by executing KeyGen, and preprocesses the data le F by using SigGen to generate the verication metadata. The user then stores the data le F at the cloud server, deletes its local copy, and publishes the verication metadata to TPA for later audit. As part of pre-processing, the user may alter the data le F by expanding it or including additional metadata to be stored at server. Audit: The TPA issues an audit message or challenge to the cloud server to make sure that the cloud server has retained the data le F properly at the time of the audit. The cloud server will derive a response message from a function of the stored data le F by executing GenProof. Using the verication metadata, the TPA veries the response via VerifyProof. Note that in our design, we do not assume any additional property on the data le, and thus regard error-correcting codes as orthogonal to our system. If the user wants to have more
efciently audit the cloud data storage without local copy of data and without bringing in additional on-line burden to cloud users. However, any possible leakage of users outsourced data towards TPA through the auditing protocol should be prohibited. Note that to achieve the audit delegation and authorize CS to respond to TPAs audits, the user can sign a certicate granting audit rights to the TPAs public key, and all audits from the TPA are authenticated against such a certicate. These authentication handshakes are omitted in the following presentation. B. Design Goals To enable privacy-preserving public auditing for cloud data storage under the aforementioned model, our protocol design should achieve the following security and performance guarantee: 1) Public auditability: to allow TPA to verify the correctness of the cloud data on demand without retrieving a copy of the whole data or introducing additional on-line burden to the cloud users; 2) Storage correctness: to ensure that there exists no cheating cloud server that can pass the audit from TPA without indeed storing users data intact; 3) Privacy-preserving: to ensure that there exists no way for TPA to derive users data content from the information collected during the auditing process; 4) Batch auditing: to enable TPA with secure and efcient auditing capability to cope with multiple auditing delegations from possibly large number of different users simultaneously; 5) Lightweight: to allow TPA to perform auditing with minimum communication and computation overhead. C. Notation and Preliminaries F the data le to be outsourced, denoted as a sequence of n blocks m1 , . . . , mn Zp for some large prime p. fkey () pseudorandom function (PRF), dened as: {0, 1} key Zp . key () pseudorandom permutation (PRP), dened as: {0, 1}log2 (n) key {0, 1}log2 (n) . M ACkey () message authentication code (MAC) function, dened as: {0, 1} key {0, 1}l . H(), h() map-to-point hash functions, dened as: {0, 1} G, where G is some group. We now introduce some necessary cryptographic background for our proposed scheme.
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE INFOCOM 2010 proceedings This paper was presented as part of the main Technical Program at IEEE INFOCOM 2010.
error-resiliency, he/she can rst redundantly encode the data le and then provide us with the data le that has errorcorrecting codes integrated. B. The Basic Schemes Before giving our main result, we rst start with two warmup schemes. The rst one does not ensure privacypreserving guarantee and is not as lightweight as we would like. The second one outperforms the rst one, but suffers from other undesirable systematic demerits for public auditing: bounded usage and auditor statefulness, which may pose additional on-line burden to users as will be elaborated shortly. The analysis of these basic schemes will lead to our main result, which overcomes all these drawbacks. Basic Scheme I The cloud user pre-computes MACs i = M ACsk (i||mi ) of each block mi (i {1, . . . , n}), sends both the data le F and the MACs {i }1in onto the cloud server, and releases the secret key sk to TPA. During the Audit phase, the TPA requests from the cloud server a number of randomly selected blocks and their corresponding MACs to verify the correctness of the data le. The insight behind this approach is that auditing most of the le is much easier than the whole of it. However, this simple solution suffers from the following severe drawbacks: 1) The audit from TPA demands retrieval of users data, which should be prohibitive because it violates the privacy-preserving guarantee; 2) Its communication and computation complexity are both linear with respect to the sampled data size, which may result in large communication overhead and time delay, especially when the bandwidth available between the TPA and the cloud server is limited. Basic Scheme II To avoid retrieving data from the cloud server, one may improve the above solution as follows: Before data outsourcing, the cloud user chooses s random message authentication code keys {sk }1 s , pre-computes s MACs, {M ACsk (F )}1 s for the whole data le F , and publishes these verication metadata to TPA. The TPA can each time reveal a secret key sk to the cloud server and ask for a fresh keyed MAC for comparison, thus achieving privacy-preserving auditing. However, in this method: 1) the number of times a particular data le can be audited is limited by the number of secret keys that must be a xed priori. Once all possible secret keys are exhausted, cloud user then has to retrieve data from the server in order to re-compute and re-publish new MACs to TPA. 2) The TPA has to maintain and update state between audits, i.e., keep a track on the possessed MAC keys. Considering the potentially large number of audit delegations from multiple users, maintaining such states for TPA can be difcult and error prone. Note that another common drawback of the above basic schemes is that they can only support the case of static data, and none of them can deal with data dynamics, which is also of paramount importance for cloud storage systems. For the reason of brevity and clarity, we will focus on the static data, too, though our auditing protocol can be immediately adapted to support data dynamics, based on our previous work [8].
C. The Privacy-Preserving Public Auditing Scheme To effectively support public auditability without having to retrieve the data blocks themselves, we resort to the homomorphic authenticator technique [6], [8], [10]. Homomorphic authenticators are unforgeable verication metadata generated from individual data blocks, which can be securely aggregated in such a way to assure an auditor that a linear combination of data blocks is correctly computed by verifying only the aggregated authenticator. However, the direct adoption of these techniques is not suitable for our purposes, since the linear combination of blocks may potentially reveal user data information, thus violating the privacy-preserving guarantee. Specically, if enough number of the linear combinations of the same blocks are collected, the TPA can simply derive the users data content by solving a system of linear equations. Overview To achieve privacy-preserving public auditing, we propose to uniquely integrate the homomorphic authenticator with random masking technique. In our protocol, the linear combination of sampled blocks in the servers response is masked with randomness generated by a pseudo random function (PRF). With random masking, the TPA no longer has all the necessary information to build up a correct group of linear equations and therefore cannot derive the users data content, no matter how many linear combinations of the same set of le blocks can be collected. Meanwhile, due to the algebraic property of the homomorphic authenticator, the correctness validation of the block-authenticator pairs will not be affected by the randomness generated from a PRF, which will be shown shortly. Note that in our design, we use public key based homomorphic authenticator, specically, the BLS based signature [10], to equip the auditing protocol with public auditability. Its exibility in signature aggregation will further benet us for the multi-task auditing. Scheme Details Let G1 , G2 and GT be multiplicative cyclic groups of prime order p, and e : G1 G2 GT be a bilinear map as introduced in preliminaries. Let g be the generator of G2 . H() is a secure map-to-point hash function: {0, 1} G1 , which maps strings uniformly to G1 . Another hash function h() : G1 Zp maps group element of G1 uniformly to Zp . The proposed scheme is as follows: Setup Phase: 1) The cloud user runs KeyGen to generate the systems public and secret parameters. He chooses a random x Zp , a random element u G1 , and computes v g x and w ux . The secret parameter is sk = (x) and the public parameters are pk = (v, w, g, u). Given data le F = (m1 , . . . , mn ), the user runs SigGen to compute signature i for each block mi : i (H(i) umi )x G1 (i = 1, . . . , n). Denote the set of signatures by = {i }1in . The user then sends {F, } to the server and deletes them from its local storage. Audit Phase: 2) During the auditing process, to generate the audit message chal, the TPA picks a random c-element subset I = {s1 , . . . , sc } of set [1, n], where sq = kprp (q) for 1 q c and kprp is the randomly chosen permutation key by TPA for each auditing. We assume that s1 sc .
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE INFOCOM 2010 proceedings This paper was presented as part of the main Technical Program at IEEE INFOCOM 2010.
For each element i I, the TPA also chooses a random value i (of a relative small bit length compared to |p|). The message chal species the positions of the blocks that are required to be checked in this Audit phase. The TPA sends the chal = {(i, i )}iI to the server. 3) Upon receiving challenge chal = {(i, i )}iI , the server runs GenProof to generate a response proof of data storage correctness. Specically, the server chooses a random element r Zp via r = fkprf (chal), where kprf is the randomly chosen PRF key by server for each auditing, and calculates R = (w)r = (ux )r G1 . Let denote the linear combination of sampled blocks specied in chal: = iI i mi . To blind with r, the server computes: = + rh(R) Zp . Meanwhile, the server also calculates an aggregated signature = iI i i G1 . It then sends {, , R} as the response proof of storage correctness to the TPA. With the response from the server, the TPA runs VerifyProof to validate the response by checking the verication equation e( (Rh(R) ), g) = e(
? sc
probability is in the order of O(1). For example, if the server is missing 1% of the data F , the TPA only needs to audit for c = 460 or 300 randomly chosen blocks of F so as to detect this misbehavior with probability larger than 99% or 95%, respectively. Given the huge volume of data outsourced in the cloud, checking a portion of the data le is more affordable and practical for both TPA and cloud server than checking all the data, as long as the sampling strategies provides high probability assurance. In Section IV, we will present the experiment result based on these sampling strategies. D. Support for Batch Auditing With the establishment of privacy-preserving public auditing in Cloud Computing, TPA may concurrently handle multiple auditing delegations upon different users requests. The individual auditing of these tasks for TPA can be tedious and very inefcient. Given K auditing delegations on K distinct data les from K different users, it is more advantageous for TPA to batch these multiple tasks together and audit at one time. Keeping this natural demand in mind, we propose to explore the technique of bilinear aggregate signature [15], which supports the aggregation of multiple signatures by distinct signers on distinct messages into a single signature and thus provides efcient verication for the authenticity of all messages. Using this signature aggregation technique and bilinear property, we can now aggregate K verication equations (for K auditing tasks) into a single one, as shown in equation 2, so that the simultaneous auditing of multiple tasks can be achieved. The details of extending our main result to this multi-user setting is described as follows: Assume there are K users in the system, and each user k has a data le Fk = (mk,1 , . . . , mk,n ) to be outsourced to the cloud server, where k {1, . . . , K}. For a particular user k, denote his secret key as xk Zp , and the corresponding public parameter as (vk , wk , gk , uk ) = (g xk , uxk , g, uk ). In the Setup phase, each user k runs k m SigGen and computes signature k,i [H(k||i) uk k,i ]xk G1 for block mk,i (i {1, . . . , n}). In the Audit phase, the TPA sends the audit challenge chal = {(i, i )}iI to the server for auditing data les of all K users. Upon receiving chal, for each user k (k {1, . . . , K}), the server randomly picks rk Zp and computes
sc K sc
H(i)i u , v)
(1)
i=s1
The correctness of the above verication equation can be elaborated as follows: e( Rh(R) , g) = e( = e(
i=s1 sc sc i i (ux )rh(R) , g)
i=s1 sc
(H(i) umi )xi (urh(R) )x , g) (H(i)i umi i ) (urh(R) ), g)x (H(i)i ) u +rh(R) , g x ) (H(i)i ) u , v)
= e(
i=s1 sc
= e(
i=s1 sc
= e(
i=s1
It is clear that the random mask R has no effect on the validity of the checking result. The security of this protocol will be proved in Section IV. Discussion As analyzed at the beginning of this section, this approach ensures the privacy of user data content during the auditing process. Meanwhile, the homomorphic authenticator helps achieve the constant communication overhead for servers response during the audit: the size of {, , R} is xed and has nothing to do with the number of sampled blocks c. Note that there is no secret keying material or states for TPA to keep or maintain between audits, and the auditing protocol does not pose any potential on-line burden toward users. Since the TPA could re-generate the random c-element subset I = {s1 , . . . , sc } of set [1, n], where sq = kprp (q), for 1 q c, unbounded usage is also achieved. Previous work [6], [8] showed that if the server is missing a fraction of the data, then the number of blocks that needs to be checked in order to detect server misbehavior with high
k =
i=s1
(
k=1 i=s1
i k,i ),
where Rk = (wk )rk = (uxk )rk . The server then responses k the TPA with {, {k }1kK , {Rk }1kK }. Similar as the single user case, using the properties of the bilinear map, the TPA can check if the following equation holds:
K
e(
k=1
Rk
h(Rk )
, g) =
k=1
sc
e(
i=s1
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE INFOCOM 2010 proceedings This paper was presented as part of the main Technical Program at IEEE INFOCOM 2010.
LHS
= e(
k=1 i=s1 K sc
= e(
k=1 i=s1 sc K
r h(Rk ) xk
) , g)
=
k=1 K
e(
i=s1 sc
r h(Rk )
), g)xk
in Section II, namely, the storage correctness and privacypreserving. We start from the single user case, where our main result is originated. Then we show how to extend the security guarantee to a multi-user setting, where batch auditing for TPA is enabled. Due to space limitation, we only list the proof sketches to gives a rough idea for achieving those guarantees. The detailed formalized proofs of Theorem 1, 2, and 3 are provided in the full version [16]. Note that all proofs are derived on the probabilistic base, i.e., with high probability assurance, which we omit writing explicitly. 1) Storage Correctness Guarantee: We need to prove that the cloud server can not generate valid response toward TPA without faithfully storing the data, as captured by Theorem 1. Theorem 1: If the cloud server passes the Audit phase, then it must indeed possess the specied data intact as it is. Proof Sketch: The proof consists of three steps. First, we show that there exists no malicious server that can forge a valid response {, , R} to pass the verication equation 1. The correctness of this statement follows from the Theorem 4.2 proposed in [10]. Note that the value R in our protocol, which enables the privacy-preserving guarantee, will not affect the validity of the equation, due to the hardness of discrete-log and the commutativity of modular exponentiation in pairing. Next, we show that if the response {, , R} is valid, where = +rh(R) and R = (v2 )r , then the underlying must be valid too. This can be derived immediately from the collisionfree property of hash function h() and determinism of discrete exponentiation. Finally, we show that the validity of implies the correctness of {mi }iI where = iI i mi . Here we utilize the small exponent (SE) test technique of batch verication in [17]. Because {i } are picked up randomly by the TPA in each Audit phase, {i } can be viewed similarly as the random chosen exponents in the SE test [17]. Therefore, the correctness of individual sampled blocks is ensured. All above sums up to the storage correctness guarantee. 2) Privacy Preserving Guarantee: We want to make sure that TPA can not derive users data content from the information collected during auditing process. This is equivalent to prove the Theorem 2. Note that if can be derived by TPA, then {mi }iI can be easily obtained by solving a group of linear equations when enough combinations of the same blocks are collected. Theorem 2: From the servers response {, , R}, no information of will be leaked to TPA. Proof Sketch: Again, we prove the Theorem 2 in three steps. First, we show that no information on can be learned from . This is because is blinded by r as = + rh(R) and R = (v2 )r , where r is chosen randomly by cloud server and is unknown to TPA. Note that even with R, due to the hardness of discrete-log assumption, the value r is still hidden against TPA. Thus, privacy of is guaranteed from . Second, we show that no information on can be learned from , where
=
k=1 K
e(
i=s1 sc
=
k=1
e(
i=s1
which is the right hand side, as required. Discussion As shown in equation 2, batch auditing not only allows TPA to perform the multiple auditing tasks simultaneously, but also greatly reduces both the communication cost on the server side and the computation cost on the TPA side. For saving communication cost, the bilinear aggregate signature ensures that the server only needs to send one group element in the response to TPA, instead of a set of {k }1kK as i required by individual auditing, where k = iI k,i denotes the aggregation signature supposed to be returned for each user k. Meanwhile, aggregating K verication equations into one helps reduce the number of expensive pairing operations from 2K, as required in the individual auditing, to K + 1. Thus, a considerable amount of auditing time is expected to be saved. Note that the verication equation 2 only holds when all the responses are valid, and fails with high probability when there is even one single invalid response in the batch auditing. In many situations, a response collection may contain invalid responses, especially {k }1kK , caused by accidental data corruption, or possibly malicious activity by a cloud server. The ratio of invalid responses to the valid could be quite small, and yet a standard batch auditor will reject the entire collection. To further sort out these invalid responses in the batch auditing, we can utilize a recursive divide-and-conquer approach (binary search). Specically, if the batch auditing fails, we can simply divide the collection of responses into two halves, and recurse the auditing on halves via equation 2. Note that TPA may now require the server to send back all the {k }1kK , as the same in individual auditing. In Section IV-B2, we show through carefully designed experiment that using this recursive binary search approach, even if up to 20% of responses are invalid, batch auditing still performs faster than individual verication. IV. S ECURITY A NALYSIS AND P ERFORMANCE E VALUATION A. Security Proofs We evaluate the security of the proposed scheme by analyzing its fulllment of the security guarantee described
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE INFOCOM 2010 proceedings This paper was presented as part of the main Technical Program at IEEE INFOCOM 2010.
TABLE II: Performance comparison under different number of sampled blocks c for high assurance auditing.
Our Scheme Sampled blocks c Sever compt. time (ms) TPA compt. time (ms) Comm. cost (Byte) 460 405.57 525.89 60 300 273.34 493.25 40 460 403.69 524.02 60 [10] 300 270.46 491.38 40
Expt ( ) G
t P airG,H
m-M ultExpt ( ) G
t m-M ultP airG,H
g ai .
the proof of Theorem 1, the storage correctness guarantee in the multi-user setting is achieved. =
iI i i = iI
B. Performance Analysis We now assess the performance of the proposed privacypreserving public auditing scheme. We will focus on the extra cost introduced by the privacy-preserving guarantee and the efciency of the proposed batch auditing technique. The experiment is conducted using C on a Linux system with an Intel Core 2 processor running at 1.86 GHz, 2048 MB of RAM, and a 7200 RPM Western Digital 250 GB Serial ATA drive with an 8 MB buffer. Algorithms use the Pairing-Based Cryptography (PBC) library version 0.4.18. The elliptic curve utilized in the experiment is a MNT curve, with base eld size of 159 bits and the embedding degree 6. The security level is chosen to be 80 bit, which means |i | = 80 and |p| = 160. All experimental results represent the mean of 20 trials. 1) Cost of Privacy-preserving Guarantee: We begin by estimating the cost in terms of basic cryptographic operations, as notated in Table I. Suppose there are c random blocks specied in the chal during the Audit phase. Under this setting, we quantify the extra cost introduced by the support of privacy-preserving into server computation, auditor computation as well as communication overhead. On the server side, the generated response includes an aggregated signature r = iI i i G1 , a random metadata R = (w) = (ux )r G1 , and a blinded linear combination of sampled blocks = iI i mi + rh(R) Zp . The corresponding computation cost is c-M ultExp1 (|i |), Exp1 (|p|), and Addc p + G G Z M ultc+1 + Hash1 p , respectively. Compared to the existing Z Zp homomorphic authenticator based solution for ensuring remote data integrity [10]1 , the extra cost for protecting the user privacy, resulted from the random mask R, is only a constant: Exp1 (|p|) + M ult1 p + Hash1 p + Add1 p , which has nothing G Z Z Z to do with the number of sampled blocks c. When c is set to be 460 or 300 for high assurance of auditing, as discussed in Section III-C, the extra cost for privacy-preserving guarantee on the server side would be negligible against the total server computation for response generation. Similarly, on the auditor side, upon receiving the response {, R, }, the corresponding computation cost for response validation is Hash1 p + c-M ultExp1 (|i |) + Hashc + G G Z 2 M ult2 +Exp2 (|p|)+P airG,G , among which only Hash1 p + G G Z Exp1 (|p|) + M ult1 p account for the additional constant G Z
1 We refer readers to [10] for detailed description of homomorphic authenticator based solutions.
[
iI
H(i)i u ]x = [
This can be analyzed as follows: (u )x is blinded by [ iI H(i)i ]x . However, to compute [ iI H(i)i ]x from [ iI H(i)i ] and g x , which is the only information TPA can utilize, is a computational Dife-Hellman problem, which can be stated as: given g, g a , g b , compute g ab . This problem is hard for unknown a, b Zp . Therefore, on the basis of computational Dife-Hellman assumption, TPA can not derive the value of (u )x , let alone . Finally, all that remains is to prove from {, , R}, still no information on can be obtained by TPA. Recall that r is a random private value chosen by the server and = +rh(R). Following the same technique of Schnorr signature [18], our auditing protocol between TPA and cloud server can be regarded as a provably secure honest zero knowledge identication scheme [19], by viewing as a secret key and h(R) as a challenge value, which implies no information on can be leaked. This completes the proof of Theorem 2. 3) Security Guarantee for Batch Auditing: Now we show that extending our main result to a multi-user setting will not affect the aforementioned security insurance, as shown in Theorem 3: Theorem 3: Our batch auditing protocol achieves the same storage correctness and privacy preserving guarantee as in the single-user case. Proof Sketch: Due to the space limitation, we only prove the storage correctness guarantee. The privacy-preserving guarantee in the multi-user setting is similar to that of Theorem 2, and thus omitted here. The proposed batch auditing protocol is built upon the aggregate signature scheme proposed in [15]. According to the security strength of aggregate signature [15], in our multi-user setting, there exists no malicious cloud servers that can forge valid 1 , . . . , k in the responses to pass the verication equation 2. Actually, the equation 2 functions as a kind of screening test as proposed in [17]. While the screening test may not guarantee the validity of each individual k , it does ensure the authenticity of k in the batch auditing protocol, which is adequate for the rationale in our case. Once the validity of 1 , . . . , k is guaranteed, from
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE INFOCOM 2010 proceedings This paper was presented as part of the main Technical Program at IEEE INFOCOM 2010.
530 520
530 520
100
150
200
430
10
15
20
Fig. 2: Comparison on auditing time between batch auditing and individual auditing. Per task auditing time denotes the total auditing time divided by the number of tasks.
Fig. 3: Comparison on auditing time between batch auditing and individual auditing, when -fraction of 256 responses are invalid. Per task auditing time denotes the total auditing time divided by the number of tasks.
computation cost. For c = 460 or 300, and considering the relatively expensive pairing operations, this extra cost imposes little overhead on the overall cost of response validation, and thus can be ignored. For the sake of completeness, Table II gives the experiment result on performance comparison between our scheme and the state-of-the-art [10]. It can be shown that the performance of our scheme is almost the same as that of [10], even if our scheme supports privacy-preserving guarantee while [10] does not. Note that in our scheme, the servers response {, R, } contains an additional random element R, which is a group element and has the same size of 159 bits as does. This explains the extra communication cost of our scheme opposing to [10]. 2) Batch Auditing Efciency: Discussion in Section III-D gives an asymptotic efciency analysis on the batch auditing, by considering only total number of expensive pairing operations. However, on the practical side, there are additional operations required for batching, such as modular exponentiations and multiplications. Meanwhile, the different sampling strategies, i.e., different number of sampled blocks c, is also a variable factor that affects the batching efciency. Thus, whether the benets of removing pairings signicantly outweighs these additional operations is remained to be veried. To get a complete view of batching efciency, we conduct a timed batch auditing test, where the number of auditing tasks is increased from 1 to approximately 200 with intervals of 8. The performance of the corresponding non-batched (individual) auditing is provided as a baseline for the measurement. Following the same experimental setting as c = 460 and 300, the average per task auditing time, which is computed by dividing total auditing time by the number of tasks, is given in Fig. 2 for both batch auditing and the individual auditing. It can be shown that compared to individual auditing, batch auditing indeed helps reduce the TPAs computation cost, as more than 11% and 17% of per-task auditing time is saved, when c is set to be 460 and 300, respectively.
3) Sorting out Invalid Responses: Now we use experiment to justify the efciency of our recursive binary search approach for TPA to sort out the invalid responses when batch auditing fails, as discussed in Section III-D. This experiment is tightly pertained to work in [20], which evaluates the batch verication efciency of various short signature schemes. To evaluate the feasibility of the recursive approach, we rst generate a collection of 256 valid responses, which implies the TPA may concurrently handle 256 different auditing delegations. We then conduct the tests repeatedly while randomly corrupting an -fraction, ranging from 0 to 20%, by replacing them with random values. The average auditing time per task against the individual auditing approach is presented in Fig. 3. The result shows that even the number of invalid responses exceeds 15% of the total batch size, the performance of batch auditing can still be safely concluded as more preferable than the straightforward individual auditing. Note that the random distribution of invalid responses within the collection is nearly the worst-case for batch auditing. If invalid responses are grouped together, it is possible to achieve even better results. V. R ELATED W ORK Ateniese et al. [6] are the rst to consider public auditability in their dened provable data possession (PDP) model for ensuring possession of data les on untrusted storages. Their scheme utilizes the RSA-based homomorphic authenticators for auditing outsourced data and suggests randomly sampling a few blocks of the le. However, the public auditability in their scheme demands the linear combination of sampled blocks exposed to external auditor. When used directly, their protocol is not provably privacy preserving, and thus may leak user data information to the auditor. Juels et al. [11] describe a proof of retrievability (PoR) model, where spot-checking and errorcorrecting codes are used to ensure both possession and retrievability of data les on remote archive service systems. However, the number of audit challenges a user can perform
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE INFOCOM 2010 proceedings This paper was presented as part of the main Technical Program at IEEE INFOCOM 2010.
is a xed priori, and public auditability is not supported in their main scheme. Although they describe a straightforward Merkle-tree construction for public PoRs, this approach only works with encrypted data. Shacham et al. [10] design an improved PoR scheme built from BLS signatures with full proofs of security in the security model dened in [11]. Similar to the construction in [6], they use publicly veriable homomorphic authenticators that are built from provably secure BLS signatures. Based on the elegant BLS construction, public retrievability is achieved. Again, their approach does not support privacy-preserving auditing for the same reason as [6]. Shah et al. [7], [12] propose allowing a TPA to keep online storage honest by rst encrypting the data then sending a number of pre-computed symmetric-keyed hashes over the encrypted data to the auditor. The auditor veries both the integrity of the data le and the servers possession of a previously committed decryption key. This scheme only works for encrypted les, and it suffers from the auditor statefulness and bounded usage, which may potentially bring in on-line burden to users when the keyed hashes are used up. In other related work, Ateniese et al. [21] propose a partially dynamic version of the prior PDP scheme, using only symmetric key cryptography but with a bounded number of audits. In [22], Wang et al. consider a similar support for partial dynamic data storage in distributed scenario with additional feature of data error localization. In a subsequent work, Wang et al. [8] propose to combine BLS based homomorphic authenticator with MHT to support both public auditability and fully data dynamics. Almost simultaneously, Erway et al. [23] developed a skip lists based scheme to enable provable data possession with fully dynamics support. However, these two protocols require the linear combination of sampled blocks just as [6], [10], and thus does not support privacy-preserving auditing on users outsourced data. While all above schemes provide methods for efcient auditing and provable assurance on the correctness of remotely stored data, none of them meet all the requirements for privacy-preserving public auditing in Cloud Computing, as supported in our result. More importantly, none of these schemes consider batch auditing, which will greatly reduce the computation cost on the TPA when coping with large number of audit delegations. VI. C ONCLUSION In this paper, we propose a privacy-preserving public auditing system for data storage security in Cloud Computing. We utilize the homomorphic authenticator and random masking to guarantee that TPA would not learn any knowledge about the data content stored on the cloud server during the efcient auditing process, which not only eliminates the burden of cloud user from the tedious and possibly expensive auditing task, but also alleviates the users fear of their outsourced data leakage. Considering TPA may concurrently handle multiple audit sessions from different users for their outsourced data les, we further extend our privacy-preserving public auditing protocol into a multi-user setting, where TPA can perform the multiple auditing tasks in a batch manner, i.e., simultaneously.
Extensive analysis shows that the proposed schemes are provably secure and highly efcient. ACKNOWLEDGEMENT This work was supported in part by the US National Science Foundation under grant CNS-0831963, CNS-0626601, CNS0716306, and CNS-0831628. R EFERENCES
[1] P. Mell and T. Grance, Draft nist working denition of cloud computing, Referenced on June. 3rd, 2009 Online at http://csrc.nist.gov/ groups/SNS/cloud-computing/index.html, 2009. [2] M. Armbrust, A. Fox, R. Grifth, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, Above the clouds: A berkeley view of cloud computing, University of California, Berkeley, Tech. Rep. UCB-EECS-2009-28, Feb 2009. [3] Amazon.com, Amazon s3 availability event: July 20, 2008, Online at http://status.aws.amazon.com/s3-20080720.html, July 2008. [4] S. Wilson, Appengine outage, Online at http://www.cio-weblog.com/ 50226711/appengine outage.php, June 2008. [5] B. Krebs, Payment Processor Breach May Be Largest Ever, Online at http://voices.washingtonpost.com/securityx/2009/01/payment processor breach may b.html, Jan. 2009. [6] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D. Song, Provable data possession at untrusted stores, Cryptology ePrint Archive, Report 2007/202, 2007, http://eprint.iacr.org/. [7] M. A. Shah, R. Swaminathan, and M. Baker, Privacy-preserving audit and extraction of digital contents, Cryptology ePrint Archive, Report 2008/186, 2008, http://eprint.iacr.org/. [8] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, Enabling public veriability and data dynamics for storage security in cloud computing, in Proc. of ESORICS09, Saint Malo, France, Sep. 2009. [9] Cloud Security Alliance, Security guidance for critical areas of focus in cloud computing, 2009, http://www.cloudsecurityalliance.org. [10] H. Shacham and B. Waters, Compact proofs of retrievability, in Proc. of Asiacrypt 2008, vol. 5350, Dec 2008, pp. 90107. [11] A. Juels and J. Burton S. Kaliski, Pors: Proofs of retrievability for large les, in Proc. of CCS07, Alexandria, VA, October 2007, pp. 584597. [12] M. A. Shah, M. Baker, J. C. Mogul, and R. Swaminathan, Auditing to keep online storage services honest, in Proc. of HotOS07. Berkeley, CA, USA: USENIX Association, 2007, pp. 16. [13] 104th United States Congress, Health Insurance Portability and Accountability Act of 1996 (HIPPA), Online at http://aspe.hhs.gov/ admnsimp/pl104191.htm, 1996, last access: July 16, 2009. [14] S. Yu, C. Wang, K. Ren, and W. Lou, Achieving secure, scalable, and ne-grained access control in cloud computing, in Proc. of IEEE INFOCOM10, San Diego, CA, USA, March 2010. [15] D. Boneh and C. Gentry, Aggregate and veriably encrypted signatures from bilinear maps, in Proceedings of Eurocrypt 2003, volume 2656 of LNCS. Springer-Verlag, 2003, pp. 416432. [16] C. Wang, Q. Wang, K. Ren, and W. Lou, Privacy-preserving public auditing for data storage security in cloud computing, Cryptology ePrint Archive, Report 2009/579, 2009, http://eprint.iacr.org/. [17] M. Bellare, J. Garay, and T. Rabin, Fast batch verication for modular exponentiation and digital signatures, in Proceedings of Eurocrypt 1998, volume 1403 of LNCS. Springer-Verlag, 1998, pp. 236250. [18] C. Schnorr, Efcient identication and signatures for smart cards, in Proceedings of Eurocrypt 1989, volume 435 of LNCS. Springer-Verlag, 1989, pp. 239252. [19] D. Pointcheval and J. Stern, Security proofs for signature schemes, in Proceedings of Eurocrypt 1996, volume 1070 of LNCS. Springer-Verlag, 1996, pp. 387398. [20] A. L. Ferrara, M. Greeny, S. Hohenberger, and M. Pedersen, Practical short signature batch verication, in Proceedings of CT-RSA, volume 5473 of LNCS. Springer-Verlag, 2009, pp. 309324. [21] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, Scalable and efcient provable data possession, in Proc. of SecureComm08, 2008. [22] C. Wang, Q. Wang, K. Ren, and W. Lou, Ensuring Data Storage Security in Cloud Computing, in Proc. of IWQoS09, July 2009. [23] C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia, Dynamic provable data possession, in Proc. of CCS09, 2009.