Privacy-Preserving Data Mining in The Malicious Model: Murat Kantarcioglu
Privacy-Preserving Data Mining in The Malicious Model: Murat Kantarcioglu
Privacy-Preserving Data Mining in The Malicious Model: Murat Kantarcioglu
Y, xxxx
Onur Kardes
Computer Science Department Stevens Institute of Technology Hoboken, NJ, USA E-mail:
Abstract: Most of the cryptographic work in privacy-preserving distributed data mining deals with semi-honest adversaries, which are assumed to follow the prescribed protocol but try to infer private information using the messages they receive during the protocol. Although the semi-honest model is reasonable in some cases, it is unrealistic to assume that adversaries will always follow the protocols exactly. In particular, malicious adversaries could deviate arbitrarily from their prescribed protocols. Secure protocols that are developed against malicious adversaries require utilisation of complex techniques. Clearly, protocols that can withstand malicious adversaries provide more security. However, there is an obvious trade-off: protocols that are secure against malicious adversaries are generally more expensive than those secure against semi-honest adversaries only. In this paper, our goal is to make an analysis of trade-offs between performance and security in privacy-preserving distributed data mining algorithms in the two models. In order to make a realistic comparison, we enhance commonly used subprotocols that are secure in the semi-honest model with zero knowledge proofs to be secure in the malicious model. We compare the performance of these protocols in both models. Keywords: privacy-preserving data mining; secure multiparty computation; malicious model. Reference to this paper should be made as follows: Kantarcioglu, M. and Kardes, O. (xxxx) Privacy-preserving data mining in the malicious model, Int. J. Information and Computer Security, Vol. X, No. Y, pp.000000. Biographical notes: Murat Kantarcioglu is currently an Assistant Professor at the University of Texas at Dallas, USA. He obtained a PhD Degree from Purdue University in 2005. He received his Masters Degree in Computer Science from Purdue University in 2002 and his Bachelors Degree in Computer Engineering from Middle East Technical University, Ankara, Turkey in 2000. His research interests lie at the intersection of privacy, security, data mining and databases.
Data needed for many crucial data mining tasks is distributed among several parties with different security and privacy concerns. In many distributed data mining settings, disclosure of the original data sets is not acceptable due to privacy concerns. To address this problem, several privacy-preserving distributed data mining protocols using cryptographic techniques have been suggested. (See Section 2.1 for a detailed discussion of previous work.) Depending on the adversarial behaviour assumptions, those protocols use different models. In the semi-honest model, each party is assumed to follow the protocol without any deviation. Nevertheless, the assumption of semi-honest behaviour may be unrealistic in some settings. In such cases, participating parties may prefer to use a protocol that is secure against malicious behaviour. It is clear that the protocols secure in the malicious model offer more security, but this generally requires complex techniques to be achieved. As a result, those protocols are typically extremely inefficient as compared to the ones secure in the semi-honest model. The efficiency versus security trade-off between the protocols that are secure in the malicious model and in the semi-honest model is not clear in practice. Therefore, a detailed analysis of the corresponding costs could be a valuable tool for decision-makers. In this paper, we investigate several protocols that are previously defined in the semi-honest model as primitives for different privacy-preserving data mining algorithms. We summarise these basic subprotocols and provide ways to make them secure against malicious adversaries using efficient zero knowledge proofs. Having basic subprotocols secure in the malicious model is neither sufficient nor necessary for constructing privacy-preserving data mining applications as this may require further study and utilisation of some other cryptographic tools. But, we believe that the methods we propose to make these primitives more efficient can also be applied to large-scale applications.
protocols in the malicious model instead of directly converting secure protocols in the semi-honest model to secure protocols in the malicious model using zero-knowledge proofs. We also provide extensive experimental analysis of the given protocols.
In this section, we first discuss the previous work done in privacy-preserving data mining. Later, we describe the cryptographic tools and definitions used in this paper.
For the sake of completeness, we provide a brief definition of the additive homomorphic cryptosystem that we use in our experiments. Please refer to (Damgard et al., 2003) for details. The Paillier (1999) cryptosystem, which is based on composite residuosity assumption, satisfies the above properties and can be defined as follows: Key generation Let p and q be prime numbers where p < q and p does not divide q 1. For the Paillier encryption scheme, we set the public key pk to n where n = p.q and private key pr to (, n) where is the lowest common multiplier of p 1, q 1. Encryption with the public key Given n, the message m, and a random number r from 1 to n 1, encryption of the message m can be calculated as follows: Epk(m) = (1 + n)m.rn mod n2. Also note that given any encrypted message, we can get a different encryption by multiplying it with some random rn. Decryption with the private key Given n, the cipher text c = Epk(m), we can (c mod n 2 ) 1 1 mod n where 1 is the calculate the Dpr(c) as follows: m = n inverse of in modulo n. Adding two ciphertexts (+h) Given the encryption of m1 and m2, Epk(m1) and Epk(m2), we can calculate the Epk(m1 + m2) as follows:
We would like to emphasise that this addition will actually return Epk(m1 + m2 mod n). Multiplying a ciphertext with a constant (k h Epk(m1)) Given a constant k and the encryption of m1,Epk(m1), we can calculate k h Epk(m1) as follows:
k h E pk (m1 ) := E pk (m1 ) k mod n2 = ((1 + n) m1 .r1n ) k mod n 2 = (1 + n) km1 .r1kn mod n2 = E pk (k .m1 ).
Also, we use efficient non-interactive zero-knowledge protocols in the random oracle model to prove that the actions taken by the parties are correct without revealing any other information (Cramer et al., 2001). We briefly summarise those protocols below. The implementation details of those protocols for Paillier encryption can be found in Cramer et al. (2000): Threshold decryption (two-party case) given the common public key pk, the private key pr corresponding to pk has been divided into two pieces pr0 and pr1. There exists an efficient, secure protocol D pri ( E pk (a)) that outputs the random share of the decryption result si (similar to classic secret sharing schemes) along with the non-interactive zero knowledge proof POD(pri, Epk(a), si) showing that pri is used correctly. Those shares can be combined to calculate the decryption result. Also any single share of the private key pri cannot be used to decrypt the ciphertext alone. In other words si does not reveal anything about the final decryption result. We also use a special version of a threshold decryption such that only one party learns the decryption result. Such a protocol could be easily implemented exploiting the fact that for any given Epk(a), the party that needs to learn the decryption result could generate Epk(r1) and then both parties jointly decrypt the Epk(a) +h Epk(r1). Since only one party knows the r1, only that party can learn the correct decryption result. Proving that you know a plaintext a party Pi can compute the zero knowledge proof POK(ea) if he knows an element a in the domain of valid plaintexts such that Dpr(ea) = a. Proving that multiplication is correct assume that party Pi is given an encryption Epk(a) and chooses constant c and calculates Epk(a.c). Later on, Pi can give zero knowledge proof POMC(ea, ec, ea.c) such that Dpr(ea.c) = Dpr(ec).Dpr(ea).
In our security proofs, we use the simulators for the above zero-knowledge proofs guaranteed due to their security properties. As discussed in Cramer et al. (2001), those simulators return a state of the adversary that is statistically indistinguishable from the
state of the adversary in the real-life execution. Also, those simulators return the secret inputs used by the adversary for valid zero-knowledge proofs with overwhelming probability. We use both of these properties in our security proofs.
All the protocols mentioned in this section are implemented in the semi-honest model as primitives for different privacy-preserving data mining algorithms. Here we summarise these basic subprotocols and provide ways to make them secure in the malicious model by using efficient zero-knowledge proofs discussed above.
Similarly, P0 generates a homomorphic key pair and sends the pk to P1 along with the encrypted vector ( E pk ( x0 ) = (Epk(x00), Epk(x01),..., Epk(x0n)). Given pk, P1 calculates ex0 x1 = ( E pk ( x00 ) h x10 ) + h ( E pk ( x01 ) h x11 ) + h + h ( E pk ( x0 n ) h x1n ) where x1i is the P1s input vectors i-th component and sends ex0 x1 + h E pk (r1 ) to P0. By decrypting the P1s message, P0 learns the random share of the dot product result x0 . x1 + r1 . Clearly P0 and P1 can combine their shares to learn x0 . x1 . Protocol 3.1 Secure equality in the malicious model using threshold decryption
Require: Two parties P0 and P1 with the shares pr0 and pr1 of the private key and private inputs x0 and x1. Ensure: Return 1 if x0 = x1 else return 0 for all Pi do Calculate exi = E pk ( xi ) Create POK (exi ) Send (exi , POK (exi )) to P1 i end for for all Pi do Check POK (ex1i ) is valid else ABORT Calculate ex0 x1 = ex0 + h (1 h ex1 ) Choose non-zero random ri and calculate eri = E pk (ri ), e( x0 x1 ).ri = ex0 x1 h ri Create POMC (ex0 x1 , eri , e( x0 x1 ).ri ) Send (ex0 x1 , eri , e( x0 x1 ).ri ), POK (eri ), POMC (ex0 x1 , eri , e( x0 x1 ).ri ) to P1i end for for all Pi do Check whether the ex0 x1 sent by P1i is correct else ABORT Check whether POK (er1i ) is valid else ABORT Check whether POMC (ex0 x1 , er1i , e( x0 x1 ).r1i ) valid else ABORT Calculate e( x0 x1 ).(r0 + r1 ) = e( x0 x1 ).r1 + h e( x0 x1 ).r0 end for for all Pi do Jointly use the trusted party T to get D pr (e( x0 x1 ).(r0 + r1 ) ) If (x0 x1).(r0 + r1) = 0 then return 1 else return 0 end for
Now we show how to securely evaluate the dot product in the malicious model for two-party case using threshold homomorphic encryption. We provide two different secure dot product protocols that can be used in the malicious model. The first protocol given in Section 3.2.1 is a generic extension of the described protocol in the semi-honest model using appropriate zero-knowledge proofs. Later on, in Section 3.2.2, we provide a solution that is more efficient than the one described in Section 3.2.1. Our second protocol indicates that using generic transformation techniques to convert protocols from semi-honest to malicious model may not be efficient.
3.2.1 Converting secure dot protocol in the semi-honest model to malicious model
If we look at the dot product protocol in the semi-honest model carefully, we need to make sure that the P1 does the multiplications correctly (using zero knowledge proofs of plaintext knowledge) and all the encryptions sent are valid (using zero knowledge proofs of correct multiplication). These could be easily achieved using the zero knowledge protocols described in Section 2.2.1. P0 sends the encrypted values along with the associated proofs of correct encryption to P1. For each multiplication, P1 generates the zero knowledge proof of correct multiplication and sends those to P0. P0 can check those proofs to make sure that dot product calculated correctly. The details are described in Protocol 3.2. Also in Theorem A.6, we prove that the protocol is secure in the malicious model.
is evaluated correctly by at least one party. Please note that both P0 and P1 have enough information to calculate r0. If both P0 and P1 calculate the same r0 value then calculations must be correct, because at least one of them is semi-honest and calculates correct r0. Therefore, if we securely make sure that both parties calculate the same value, then either of the local calculations could be decrypted to reveal r0 to P0. In our second protocol, each party sends the encrypted inputs along with the knowledge of plaintext proofs to each other, then each party Pi locally computes its respective er i = E pk (r0i ). After that
point, they use a slightly modified version of the equality protocol to check whether D pr (er 0 ) = D pr (er1 ) or not. If those two values are equal, both parties jointly decrypt one
0 0
of those values to reveal r0 to P0. Clearly, in this version, we do not need to send expensive zero-knowledge proofs of correct multiplications for every multiplication. Due to reduced number of zero-knowledge proofs, the following protocol can offer huge
savings when the vectors used for the dot product have many components. We provide the details of the efficient secure dot product function in Protocol 3.3. In Theorem A.7, we prove that the protocol is secure in the malicious model. Protocol 3.2 Secure dot product in the malicious model using threshold decryption: extension of the semi-honest version Require: Two parties P0 and P1 with the shares pr0 and pr1 of the private key pr and n bit vectors xi where xi belongs to Pi. Ensure: Return r0 = i ( x0 i .x1i ) + r1 to P0 and r1 to P1
for P0 do i, set exi = E pk ( x0 i ) and create POK (ex0 i ) Send encryptions and non-interactive zero knowledge proofs to P1 end for for P1 do i, check whether POK (ex0 i ) is correct else ABORT i, calculate ex1i = E pk ( x1i ), ex0 i . x1i = ex0 i h x1i Choose non-zero random r1 Calculate
er1 = E pk (r1 )
Calculate es = E pk
( (x
i, send ex1 , ex0 i . x1i , POK (ex1i ), POMC (ex0 i , ex1i , ex0 i . x1i ), er1 , POK (er1 ) and es to P0 end for for P0 do i, check whether the POK (ex1i ) is correct else ABORT i, check whether the POMC (ex0 i , ex1i , ex0 i . x1i ) is correct else ABORT Calculate es = E pk
( (x
n i
. x1i ) + r1
end for Jointly, call private decrypt function such that only P0 learns the decryption of es
possessed by P0 and P1 (i.e., D > n.m). For the cases where both m and n are bigger than O( ( D)) or where n or m equal to D, we suggest using the simple secure set intersection and set union protocols that are secure in the malicious model. Our algorithms require O(D) homomorphic encryptions and zero knowledge proofs of correct multiplication. The main idea is that we can represent the sets owned by each party as a bit vector of size D and use secure multiplication property of the homomorphic encryption and associated zero knowledge proof to give secure set protocols in the malicious model. Protocol 3.3 Secure dot product in the malicious model using threshold decryption: efficient dot product specific version Require: Two parties P0 and P1 with the shares pr0 and pr1 of the private key pr and n bit vectors xi where xi belongs to Pi. Ensure: Return r0 = i ( x 0 i .x1i ) + r1 to P0 and r1 to P1
for all Pi do j, set exij = E pk ( xij ) and create POK (exij ) if Pi = P1 then Choose random r1, set er1 = E pk (r1 ) and create POK (er1 ) end if Send encryptions and non-interactive zero knowledge proofs to P1i end for for P0 do i, check whether the POK (ex1i ) is correct else ABORT Check whether the POK (er1 ) is correct else ABORT Set er 0 to E pk
( (x
n i
end for for P1 do i, check whether the POK (ex0 i ) is correct else ABORT Set er1 to E pk
( (x
n i
end for Jointly call decrypt equality protocol to check whether D pr (er1 ) = D pr (er 0 )
0 0
for all Pi do if Secure equality protocol returns true for D pr (er1 ) = D pr (er 0 ) then
0 0
Jointly call private decrypt protocol such that P0 learns the D pr (er1 )
Protocol 3.4 Secure set intersection in the malicious model using threshold decryption Require: Two parties P0 and P1 with the shares pr0 and pr1 of the private key pr and input bit vectors of size D where xij is set to one if Pi has item j Ensure: Return D bit vector I that represents the set intersection where Ii is set to one if item i is in the set intersection. for P0 do i, set exi 0 = E pk ( x 0 i ) and create POK (ex0 i ) to prove that each x0i is either zero or one Send encryptions and non-interactive zero knowledge proofs to P1 end for for P1 do i, check whether POK (ex0 i ) is correct else ABORT i, x1i calculate ex1i = E pk ( x1i ), ex0 i . x1i = ex0 i h x1i i, send ex1i , ex0 i . x1i , POK (ex1i ) (again proving x1i is either zero or one), and
POMC (ex0 i , ex1i , ex0 i . x1i ) to P0
end for for P0 do i, check whether the POK (ex1i ) is correct else ABORT i, check whether the POMC (ex0 i , ex1i , ex0 i . x1i ) is correct else ABORT end for i, jointly call threshold decryption function to learn D pr (ex0 i . x1i ). set Ii to D pr (ex0 i . x1i ). Let us assume that x0i is set to 1 if P0 has item i in its private set else it is set to 0 (similarly for x1i for P1). Clearly for calculating set intersection, we need to calculate x0i x1i for each i. Similarly, for set union, we need to calculate x0i x1i for all i. Note that operation is just a multiplication. For the set union, we can rewrite x0i x1i as (x0i x1i). This implies that if (x0i x1i) is equal to zero then item i is in the set union. Therefore, we can use the multiplication protocol for set union too. The details of the set intersection protocol which is very similar to dot product Protocol 3.2 is given in Protocol 3.4. The same protocol can be used for two-party set union using x0i and x1i as the input values and negating the output bits. As before, we can prove that the Protocol 3.4 is secure in the malicious model. Please see Theorem A.8 for details.
Performance evaluation
In this section, we analyse the performance of several privacy-preserving data mining algorithms in the malicious model. As stated before, the efficiency of a distributed data mining algorithm can be estimated in terms of the primitives it utilises; i.e., the number of secure dot products, secure comparisons, etc. Therefore, in this section, we explore the efficiency trade-offs in these basic subprotocols and use the performance results to estimate possible overall slow down in the privacy-preserving distributed data mining algorithms in the malicious model. In our implementation, we used the zero knowledge protocols given for Paillier encryption in random oracle model (Cramer et al., 2000).
In Figure 1 the values in y-axis are displayed in log10. We show that there is a positive linear relationship between the overall running time of the secure dot product and the input size. Also it can be observed from Figure 1 that there is a significant difference (about 700 times) between running times of the secure dot product protocol in the semi-honest and malicious models. The running times displayed in the figure involve both communication and computation times and the difference can be explained with the impact of communication and computation overhead that the zero knowledge proofs bring. However such considerable difference should not lead to a conclusion that utilising the secure dot product protocol in the malicious model is totally unusable and inapplicable; in Figure 1 we also show that the overall running time in the malicious model can be cut by half by designing a more efficient protocol. It is possible that more sophisticated methods and modifications may lead to even bigger increases in efficiency. Generic circuit evaluation methods (Malkhi et al., 2004) can be an alternative for using zero knowledge proofs as they also provide security and privacy in the malicious model. However the size of the circuit to be used for evaluating the secure dot product protocol is proportional to the input size; i.e., the circuit should be duplicated n times. As a result of this, for even small data sets (1000 elements), the circuit evaluation becomes totally infeasible to use.
For both primitives, since the number of inputs is fixed (as 2), the bit lengths of the inputs play a more important role on the overall running times. Such positive linear relationship can be observed in Figure 2. In that figure, we compare semi-honest and malicious
models; and we also provide running times for circuit evaluation using Fairplay (Malkhi et al., 2004). It can be observed that the overall running time of our protocol in the malicious model is very close to one in the semi-honest model and significantly better than the circuit evaluation. This is because the number of zero knowledge proofs used in the malicious model does not depend on the bit length; however the size of the circuit used in Fairplay grows as the bit length increases.
(Yang and Wright, 2006) provides privacy-preserving protocol for Bayesian Network construction in the semi-honest model. The protocol defines efficient computation of association rules for Bayesian Networks. The computations involve secure dot products with inputs of length n where n can be arbitrarily large. The protocol also contains several other subprotocols which do not depend on n (the input size). When a conversion from semi-honest model to malicious model is thought, according to Figure 1 even our efficient dot product protocol for the malicious model does not produce practical results for n > 1000.
Most of the privacy-preserving data mining algorithms in the literature were developed against semi-honest adversaries; and their correctness and security rely on the presumption that all participating parties follow the protocol without any deviation. Although this assumption may be justifiable for some cases, there are many real-life examples where less restrictive assumptions should be made. The malicious model introduces solid rules for security while requiring significant computational and communicational overhead. However, the efficiency versus security trade off between semi-honest and malicious models is not clear in practice. In this study, using standard primitive algorithms in the semi-honest model (secure dot product, comparison, and set intersection), we first provided ways for developing them in the malicious model with the help of threshold decryption and zero knowledge proofs. Then we showed that these algorithms can further be improved in terms of efficiency by specialising them in the malicious model. As an example, we provided an efficient algorithm for secure dot product in the malicious model. We evaluated the performance of the algorithms in the malicious model by comparing them with the ones in the semi-honest model. We also included the performance analysis results of the circuit evaluation versions as a benchmark; thus showing the superiority of our methods. Finally, we discussed usability and applicability of the primitive algorithms in the malicious model when they are used within larger applications such as privacy-preserving k-means clustering and association rule mining. As future work, we intend to extend our protocols to handle more than two parties. We also plan to analyse the performance of the other privacy-preserving algorithms in the malicious model using subprotocols devised in this paper.
Murat Kantarcioglu is partially supported by Air Force Office of Scientific Research under Grant No. FA9550-07-1-0041. Onur Kardes is partially supported by the National Science Foundation under Grant No. CCR-0331584. Both authors thank Rebecca Wright for useful discussions and comments.
Boudot, F., Schoenmakers, B. and Traor, J. (2001) A fair and efficient solution to the socialist millionaires problem, Discrete Applied Mathematics, Vol. 111, Nos. 12, pp.2336. Canetti, R. (2000) Security and composition of multi-party cryptographic protocols, Journal of Cryptology, Vol. 13, No. 1, pp.143202. Cramer, R., Damgard, I. and Nielsen, J.B. (2000) Multi-party computation from threshold homomorphic encryption, Technical Report RS-00-14, Basic Research in Computer Science BRICS, June. Cramer, R., Damgard, I. and Nielsen, J.B. (2001) Multi-party computation from threshold homomorphic encryption, Lecture Notes in Computer Science, Vol. 2045. Damgaard, I., Fitzi, M., Kiltz, E., Nielsen, J.B. and Toft, T. (2006) Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation, Proceedings of the Third Theory of Cryptography Conference, TCC 2006, pp.285304. Damgard, I., Jurik, M. and Nielsen, J. (2003) A generalization of Pailliers public-key system with applications to electronic voting, Du, W. and Zhan, Z. (2002) Building decision tree classifier on private data, in C. Clifton and V. Estivill-Castro (Eds.) IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, Australian Computer Society, Maebashi City, Japan, 9 December, Vol. 14, pp.18. Freedman, M.J., Nissim, K. and Pinkas, B. (2004) Efficient private matching and set intersection, Eurocrypt 2004, International Association for Cryptologic Research (IACR), Interlaken, Switzerland, 26 May. Gilburd, B., Schuster, A. and Wolff, R. (2004) Privacy-preserving data mining on data grids in the presence of malicious participants, Proceedings of HPDC04, Honolulu, Hawaii, June. Goldreich, O. (2004) The Foundations of Cryptography, Chap. 7, General Cryptographic Protocols, Cambridge University Press, Vol. 2. Jagannathan, G. and Wright, R.N. (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data, Proceedings of the 2005 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, 2124 August, pp.593599. Kantarciolu, M. and Clifton, C. (2004a) Privacy-preserving distributed mining of association rules on horizontally partitioned data, IEEE TKDE, September, Vol. 16, No. 9, pp.10261037. Kantarciolu, M. and Clifton, C. (2004b) Privately computing a distributed k-nn classifier, in J-F. Boulicaut, F. Esposito, F. Giannotti and D. Pedreschi (Eds.) PKDD2004: 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, 2024 September, pp.279290. Kardes, O., Ryger, R.S., Wright, R.N. and Feigenbaum, J. (2005) Implementing privacy-preserving Bayesian-net discovery for vertically partitioned data, ICDM Workshop on Privacy and Security Aspects of Data Mining. Kissner, L. and Song, D. (2005) Privacy-preserving set operations, Advances in Cryptology CRYPTO 2005. Lin, X., Clifton, C. and Zhu, M. (2005) Privacy preserving clustering with distributed EM mixture modeling, Knowledge and Information Systems, July, Vol. 8, No. 1, pp.6881. Lindell, Y. and Pinkas, B. (2000) Privacy preserving data mining, Advances in Cryptology CRYPTO 2000, Springer-Verlag, 2024 August, pp.3654. Lindell, Y. and Pinkas, B. (2002) Privacy preserving data mining, Journal of Cryptology, Vol. 15, No. 3, pp.177206. Malkhi, D., Nisan, N., Pinkas, B. and Sella, Y. (2004) Fairplay a secure two-party computation system, Proceedings of the 13th Conference on Usenix Security Symposium, Vol. 13, p.20.
Paillier, P. (1999) Public-key cryptosystems based on composite degree residuosity classes, EUROCRYPT, pp.223238. Vaidya, J. and Clifton, C. (2002) Privacy preserving association rule mining in vertically partitioned data, The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2326 July, pp.639644. Vaidya, J. and Clifton, C. (2003) Privacy-preserving k-means clustering over vertically partitioned data, The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, 2427 August. Yang, Z. and Wright, R.N. (2006) Privacy-preserving computation of Bayesian networks on vertically partitioned data, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 9, pp.12531264.
1 2 Since the equality protocol defined here is symmetric, we assume that P0 is corrupted without loss of generality. The zero-knowledge proofs used in our protocols can be run in parallel. Since simulating the sequential execution is conceptually simpler, we assume that the zero-knowledge proofs are executed sequentially. This is possible due to the special properties of the zero-knowledge proofs we have used. Please see Cramer et al. (2001) for details.
Let be a two-party protocol where each party Pi has a secret input xis and a public
input .xip . Each Pi returns a private output yis and a public output yip after the execution of the protocol. Let A be an adversary that can corrupt any one (and only one) party s during the execution of the protocol. Let x = ( x1s , x1p , x2 , x2p ) be the participating parties
input, let r = (r1 , r2 , rA ) be the random inputs of the parties and the adversary, let C {1,2} be the index of the corrupted party, and let z {0,1}* be the auxiliary input (i.e., z could be seen as the prior information that can be used by the adversary.) Let k be the security parameter (i.e., k could be seen as the parameter defining encryption key sizes.) We denote the output of the adversary after the execution of the protocol as ADV R , A (k , x , C , z, r ) and similarly the output of the party Pi as EXEC , A (k , x, C , z, r )i . Let:
EXEC , A (k , x, C , z, r ) = ( ADV R , A (k , x, C , z, r ), EXEC , A (k , x, C , z, r )1 , EXEC , A (k , x, C , z, r )2 ).
We define the EXEC , A (k , x, C , z ) to be the random variable for a uniformly chosen r . Finally, we define a distribution ensemble EXEC , A with security parameter k which is indexed by ( x , C , z ) as:
EXEC , A (k , x , C , z ) kN , x({0,1}*)4 ,C{1,2}, z{0,1}* .
Privacy-preserving data mining in the malicious model Definition A.2 Ideal model (Cramer et al., 2001)
Let f : N ({0,1}*)4 {0,1} 6 ({0,1}*)4 be a probabilistic two-party function computable in probabilistic polynomial time. We define the output of f as:
s s f (k , x1s , x1p , x2 , x2p , r ) = ( y1s , y1p , y2 , y2p )
where k is the security parameter and r is the random input. In the ideal model, parties send their inputs to an incorruptible trusted party which draws r uniformly random, computes f and returns the party Pi its output value ( yis , yip ). At the beginning of the
s execution the ideal model adversary S sees the x1p and x2p values and the secret xC s p s p , xC with the xC , xC values of value for the corrupted party. After this point, S replaces xC its choice. f is then evaluated by the trusted party using the modified inputs. After the evaluation, Pi receives its output value ( yis , yip ). Again adversary sees the y1p , y2p values s and the yC value for the corrupted party. Similar to the real model, let:
denote the collection of the outputs and IDEAL f,S be the distribution ensemble indexed by ( x, C , z ). Definition A.3 Security in the static malicious adversary setting (Cramer et al., 2001)
Let f be a two-party function and let be a protocol for two parties. We say that securely evaluates f in the static setting if for any probabilistic polynomial time adversary A, there exists an ideal-model adversary S whose running time is polynomial in the running time of A, and such that:
where denotes the computational indistinguishability between two ensembles. Security in this model implies that any adversary in the real-life model can be emulated by an adversary in the ideal model. The basic advantage of this simulation/emulation paradigm is that we can show that anything learned by the real-life adversary during the protocol execution is computationally indistinguishable from what is learned by an ideal model adversary. Since in the ideal model, any adversary can learn at most the final result and what is implied by the final result, proving that the real-life model adversary could be simulated by an ideal model adversary implies that real-life adversary could not learn anything more than the ideal model adversary. In other words, the real protocol execution reveal no more information to an adversary than what is revealed to an ideal model adversary.
Therefore, in the security proofs, we define an ideal model adversary S that runs any given real-life adversary A as a subroutine in a black-box fashion. We show that their views are computationally indistinguishable. We would like to stress that we only consider computational security in this paper. For protocols that are unconditionally secure, please refer to Damgaard et al. (2006). We also need to combine several secure function evaluations to create new protocols. In order to prove that the composed protocol is secure, we first show that the protocol is secure given a trusted party (i.e., oracle) that implements the functions used as a subroutine for the composed protocol. We name the function calls that use the given trusted party as oracle calls. Later, using the theorems stated in Canetti (2000), we can replace the oracle calls with secure protocols without violating security. In order to formalise the above intuition, we first define the hybrid model: Definition A.4 The hybrid model: two-party case (Cramer et al., 2001)
The execution of the protocol in the (h1 , h2 , , hm )-hybrid model proceeds as in the real model, except that the parties have oracle access to a trusted party T for evaluating the two-party functions h1 , , hm . These function evaluations proceed as in the ideal model. Similar to the previous definitions, we denote the output of the protocol with the following distribution ensemble:
h1 , hm EXEC . ,A
Similar to the security definition given above, we define the security in the hybrid model by requiring that for any adversary operating in the hybrid model, there exists an adversary S in the ideal model such that:
h1 , hm IDEAL f , S EXEC . ,A c
As mentioned before, we can replace the oracle calls to function hi with its secure implementation in the real model without sacrificing security. Please refer to Malkhi et al. (2004) for details. Theorem A.5 Protocol 3.1 is secure in the (decryption)-hybrid model assuming that the non-interactive zero-knowledge protocols used are secure in the malicious model.
Proof In order to prove the security of the protocol, for any adversary A operating in the hybrid model, we need to define an adversary SA operating in the real model such that the views of the both adversaries are computationally indistinguishable. In order to define such SA, we will use A as a subroutine. Before the simulation starts, SA will be given the description of the A, private input of the corrupted party X0,1 the final result of the equality test b, public key pk and the private key of the P1. Now we can define the SA as follows: run A to get Epk(x0) along with the POK (ex0 )
run the simulator SPOK by giving the current state of the A and Epk(x0) as an input to SPOK. If the simulator of the proof fails then terminate the protocol, else set the state of A returned by the SPOK if b is 1 then feed A with Epk(x0) else feed A with Epk(ra) for some random ra X0 along with the correct zero-knowledge proof. Let xSA be the plaintext value given to A in this step run SPOCM to simulate the zero-knowledge proof and set the state of A by the state returned by the SPOCM. If the proof fails then terminate. Also feed the A with the correct zero-knowledge proof for the encrypted value given to A in the previous state get the e( x0 xS
)( r0 + r1 )
number if b = 0 else give 0 to A output whatever A outputs. We now need to prove that the view of SA is computationally indistinguishable from the execution in the hybrid model. First note that until Step 2, the view of the A in the simulation is statistically indistinguishable to the view of A in the hybrid model. Security of the zero-knowledge guarantees that on computationally indistinguishable inputs, the output state of the zero-knowledge proof simulator is identical to state of A in the hybrid protocol. With the similar arguments, we can argue that the state of A before the Step 5 is identical to the state in the hybrid model. Now we need to show that the result returned by the decryption call in the simulation is statistically indistinguishable from the one seen by the A in the actual implementation. If b = 1, then in the both executions, A will be given 0, if b = 0 than in the simulation, A can see any value with equal probability, in the hybrid model execution A will get (x0 x1).(r0 + r1). Since r1 is random, the probability that (x0 x1).(r0 + r1) equals to zero is negligibly small in terms of the security parameter. * Also (x0 x1).(r0 + r1) is distributed uniformly in Z n since all operation are done modulo n. Therefore, the state of A after the Step 5 is the same in both executions. This concludes our proof. Theorem A.6 Protocol 3.2 is secure in (private decrypt)-hybrid model assuming that the non-interactive zero-knowledge protocols used are secure in the malicious model.
Proof Again, for any adversary A operating in the hybrid model, we need to find an adversary SA operating in the ideal model. In order to simplify our simulator, let us describe SA for two different cases depending on whether P0 or P1 is corrupted. First let us assume that A controls P0, we can define SA as follows: SA gets the final result
n i
( x0 i . x1i ) + r1 as input.
SA uses the simulator SPOK for each x0i input sequentially.2 If any one of the proofs terminate then SA terminates also.
M. Kantarcioglu and O. Kardes SA sets the state of A returned by the last run of SPOK. SA simulates the honest P1 by constructing the required correct zero knowledge proofs and feeds A with those proofs. SA simulates the joint call to private decrypt function by returning to A. SA outputs whatever A outputs.
n i
( x0 i . x1i ) + r1
Please note that the state of the A after the last execution of SPOK is identical in both worlds. Since zero knowledge proofs seen by the A that are given by SA for simulating the correct behaviour of P1 are encrypted using a semantically secure encryption, the view of A in both worlds should be computationally indistinguishable. Therefore, the state of A before the oracle call to private decrypt function should be identical. Since SA gives the correct result to A, the outputs in both worlds should be computationally indistinguishable. For the case where P1 is corrupted, the construction of SA is very similar to the case where P0 is corrupted. Only the order of execution of simulators SPOK and SPOCM are different. We omit the further details here. Theorem A.7 The Protocol 3.3 is secure in (secure equality, private decrypt) -hybrid model assuming that the non-interactive zero-knowledge protocols used are secure in the malicious model.
Proof Without loss of generality, let us assume that real-world adversary A controls P0. Again for any real adversary A, we define a simulator SA such that output of the adversary is computationally indistinguishable in both worlds. Again, let us assume that SA is given the output of the protocol. Now we can define the SA as follows: run SPOK to verify the zero knowledge proofs and set A to the state returned by SPOK use the SPOK to learn the x0i values3 with overwhelming probability. If SPOK does not return the x0i values then abort generate random x1i and r1 values such that the dot product result is consistent with the one given to SA feed A with the correct zero knowledge proofs for all Epk(x1i) and Epk(r1) calculate the correct encrypted value er1
get the input from the A for secure equality protocol run the simulator for the secure equality protocol with the correct inputs set the state of A to the state returned by the secure equality protocol simulator if the both inputs are equal then return the correct result to A after the private decrypt call else abort
We need to show that the output of the A in the both worlds should be computationally indistinguishable. First note that, since A only sees the encrypted x1 values, the state of A after Step 5 is identical, otherwise A could be used as distinguisher for the homomorphic encryption. Since the inputs to the simulator for secure equality are computationally indistinguishable in both worlds, the state of A after the Step 8 is identical. Finally before the last step, the input given to A in the both worlds are the same and states are identical. Therefore, the output of A in both worlds should be computationally indistinguishable. Again the simulator for the case where A controls P1 is the same. Therefore, we omit the discussion of that case. Theorem A.8 The Protocol 3.4 is secure in (threshold decryption)-hybrid model assuming that the non-interactive zero-knowledge protocols used are secure in the malicious model.
Proof Again, we need to prove that for any given adversary A operating in the hybrid model, we can simulate its actions in the ideal world. The SA operating in the ideal world, is very similar to the one given in the proof of the Theorem A.6. First let us assume that A controls P0, we can define SA as follows: get the final results for all Ii values use the simulator SPOK for each x0i input sequentially. If any one of the proofs terminate then SA terminates also set the state of A returned by the last run of SPOK simulate the honest P1 by constructing the correct zero knowledge proofs that are required. Feed A with those proofs simulate the joint call to private decrypt function by returning the correct Ii values for all i Output whatever A outputs.
Please note that the state of the A after the last execution of SPOK is identical in both worlds. Since zero knowledge proofs seen by the A that are given by SA for simulating the correct behaviour of P1 are encrypted using a semantically secure encryption, the view of A in both worlds should be computationally indistinguishable. Therefore, the state of A before the oracle calls to private decrypt function should be identical. Since SA gives the correct result to A, the outputs in both worlds should be computationally indistinguishable. Since the case where A controls the P1 is similar, we omit the simulator for that case.