Future Generation Computer Systems: Ping Li Jin Li Zhengan Huang Tong Li Chong-Zhi Gao Siu-Ming Yiu Kai Chen
Future Generation Computer Systems: Ping Li Jin Li Zhengan Huang Tong Li Chong-Zhi Gao Siu-Ming Yiu Kai Chen
Future Generation Computer Systems: Ping Li Jin Li Zhengan Huang Tong Li Chong-Zhi Gao Siu-Ming Yiu Kai Chen
highlights
• In the basic scheme, we use M-FHE as our privacy-preserving technique. Only the decrypt operation needs the interaction among data owners.
• In the advanced scheme, we propose a hybrid structure scheme by combining the double decryption mechanism and FHE.
• In the advanced scheme, only the encrypt and decrypt algorithms are performed by data providers.
• We prove that these two multi-key privacy-preserving deep learning schemes over encrypted data are secure.
in cloud over encrypted data. The key challenges include (1) Organization. The rest of this paper is organized as follows. In
Data are located in different places and encrypted with different Section 2, we briefly discuss the related work. Some notations,
keys. To protect data privacy, all computation (e.g. inner product including deep learning, stochastic gradient descent, BCP scheme,
and the approximation of nonlinear sigmoid function used in FHE, and MK-FHE will be described in Section 3. We give the system
deep learning), intermediate results generated during the deep model definition and describe the details of our privacy-preserving
learning process and the learning results must be secure. (2) To deep learning system in Section 4 and Section 5, respectively.
improve the efficiency of the deep learning process, computation Section 6 shows the complexity and security analysis for the
should be done by the cloud server so as to decrease the proposed system. And we give an application in our system in
computation/communication cost of the data owner(s). Existing Section 7. Finally, we conclude the paper in Section 8.
solutions such as secure multi-party computation (SMC) [3],
encryption schemes, garbled circuit, and detective controls were 2. Related work
designed for other scenarios and cannot be applied directly to
tackle these two challenges. 2.1. Deep learning
Our Contributions. To solve the above challenges, this paper In cloud computing, deep learning has shown its success in
designs two schemes to support multi-key learning system. Both many cases such as image recognition [9,10], speech recogni-
schemes allow multiple data owners with different datasets to tion [11], and biomedical data analysis [12]. Deep learning is able
collaboratively learn a neural network model securely in cloud to transform the original data into a higher level and more ab-
computing. To protect the confidentiality, data owners encrypt stract expression. It means that high-dimensional original data
their sensitive data with different public keys before uploading to can be converted to low-dimensional data by training a multi-
cloud server. ple neural network with a small central layer to reconstruct high-
We first propose a basic scheme which is based on multi-key dimensional input data. Through these transformations, compli-
fully homomorphic encryption (MK-FHE) [4–6]. In this scheme, cated functions can be learned by composing many simple func-
multiple data owners send their data (encrypted with different tions. Hinton et al. [13] showed that multiple hidden layers of arti-
public keys chosen by data owners independently of each other) ficial neural network have excellent characteristics of learning abil-
to an untrusted cloud server. Cloud server computes the output ity. The characteristic obtained by learning are more intrinsic char-
of deep learning on this joint data and issues it back to all acterization of the data that facilitates an improved visualization or
participating data owners. Finally, all of the data owners jointly classification of the data. They also showed that the difficulty to op-
timize the weights in nonlinear auto-encoders can be overcome by
perform a secure SMC protocol to decrypt and extract results from
layer-by-layer ‘‘pretraining’’ procedure. Usually, deep learning ar-
this encrypted deep learning results.
chitectures are constructed as multi-layer neural networks. There
To avoid the interaction among multiple data owners, we
are several different neural architectures, such as the feed-forward
further propose an advanced scheme which is based on a hybrid
neural network, Recurrent Neural Network (RNN), and Deep Belief
structure by combining the double decryption mechanism (BCP Network (DBN).
scheme [7]) and fully homomorphic encryption (FHE) [8]. If we
only use BCP scheme to support the secure computation, in the
2.2. Privacy-preserving machine learning
training phase, both the computation of inner product of the
inputs and weights, and the computation of the activation function With the advance of cloud computing, some related works have
require additional communication with the cloud server. To solve addressed some security problems in cloud, such as the security for
this challenge, we introduce FHE scheme directly by transforming the cloud framework [14,15], location privacy in mobile cloud [16,
BCP ciphertext into FHE ciphertext, such that the computations 17], security in cloud storage [18–20], data mining [21–24] and
over FHE ciphertext can be realized without interaction. In this machine learning [25–27]. Existing privacy-preserving techniques
scheme, a cloud server C and an authorized center (a trusted third are based mainly on data perturbation method and cryptographic
party) AU are queried, which is assumed to be non-colluding methods (such as secure multi-party computation and secure
and honest-but-curious. The cloud server C keeps the encrypted function evaluation).
datasets under different public keys uploaded by multiple data In the data perturbation method, differential privacy [28–
owners. The authorized center AU, on the other hand, only holds 30] has been widely applied to protect privacy of statistical
the master key of the master decryption of BCP scheme and the database. Generally speaking, differential privacy guarantees that
private key of FHE. In this paper, all participants are assumed to be the removing or adding one record (usually known as noise) does
honest-but-curious. not (substantially) affect the outcome of any usefully analysis.
In summary, our contributions can be summarized as follows: Therefore no risk is incurred by joining the database, providing
a mathematically rigorous means of coping with the fact that
• We address a multi-key privacy-preserving deep learning in distributional information may be disclosive. For instance, Abadi
cloud computing by proposing two schemes, which allow et al. [31] proposed a new algorithms, which are based on
multiple data owners to conduct collaboratively privacy- differential privacy version of stochastic gradient descent (SGD)
preserving deep learning. process. In their work, scaled noise is added to the computed
• Our multi-key privacy-preserving deep learning schemes are gradient to prevent information leakage. They also implemented
able to preserve the privacy of sensitive data, intermediate the model on several neural networks and analyzed the privacy
results as well as the training model. leakage. In [32], the authors considered another approach, in
which they proposed a distributed selective SGD by collecting
• We provide a security analysis to guarantee the privacy-
computed gradients from different parties. Then the users update
preserving of our proposed two schemes.
the parameters of deep learning model selectively according to
• We give an application of our advanced scheme in face the collected gradients. The selective SGD can ensure data privacy
recognition. Note that our solutions are generic and can be because the computation process is held locally and only gradients
applied to perform many other machine mining with the same will be reported to the central server. However, it is more difficult
setting over the same setting. for selective SGD to achieve the global/local optimal compared to
78 P. Li et al. / Future Generation Computer Systems 74 (2017) 76–85
use conventional SGD with the entire dataset. In other words, the Algorithm 1 Multi-layer Back-propagation network learning
data are not fully utilized in selective SGD. Input: input sample x, target vector t, learning rate η, sigmoid
For the cryptographic methods, the proposed schemes based function f (x), and network depth l;
on privacy-preserving machine learning have been presented Output: the weight matrices of the model: W (i) , i ∈ {1, 2, · · · , l}
recently. In these existing schemes, there are two different 1: Feed Forward Stage:
settings: (1) training without the aid of cloud/third-party, and (0)
2: h = x;
(2) training with the aid of cloud/third-party. In the first setting, 3: for k = 1 to l do
the authors in [33,34] proposed a privacy-preserving two-party 4: v (k) = W (k) h(k−1) ;
distributed algorithm for back-propagation training with vertically 5: h(k) = f (v (k) );
partitioned data and arbitrarily partitioned data, respectively. 6: end for
They used ElGamal scheme to support the secure computation (l)
7: y = h
operations. In the second setting, the authors used the SMC 8: E = E (t , y)//compute the cost function
technique to train the horizontal partitioned data for multi- 9: Back-propagation Stage:
party case [35]. Graepel et al. [27] demonstrated that some basic 10: e ← ∇y E = ∇y E (t , y)//compute the gradients of E with
machine learning algorithms, such as simple linear classifiers, can parameter W
be performed efficiently over a small scale encrypted datasets. 11: for k = l to 1 do
However, the efficiency will be degraded rapidly when the input 12: e ← ∇v (k) E = ef ′ (v (k) )//convert the gradient on the layer’s
size grows large. Yuan et al. [36] adopted a doubly homomorphic output into the gradient pre-nonlinearity activation
encryption scheme (BGN) [37] and proposed a system in which the 13: ∇W (k) E = eh(k−1)τ //compute the gradients on weights
(k−1)
training phase can be securely delegated to a cloud system for the 14: e ← ∇h E = W (k)τ e//modify the next lower-level hidden
multi-party scenario. In [38], the authors used BGV scheme [39] layer’s activations
to support the secure computation operations and realized a high- 15: end for
order back-propagation algorithm efficiently for deep computation
model training on the cloud.
Recently, researchers also proposed a CryptoML [40] frame-
work for secure delegation of iterative machine learning to un-
trusted cloud servers. This secure delegation protocol is based
on Shamir’s secret sharing model. However, none of the existing
crypto-based schemes are able to deal with data encrypted with
different public keys. In our work, we propose two solutions to
tackle this challenge. We show how to achieve privacy-preserving
deep learning for training encrypted datasets under different public
keys. In our two schemes, the data owners encrypt the date before
uploading it to the cloud server. Most of the computation is per-
formed by the cloud server and only the data owners are able to
obtain the final results of the training model.
3. Preliminaries
Stochastic gradient descent (SGD). When we perform the gradient • (pk, sk) ← KeyGen(pp): choose a random element a ∈ ZN 2 and
descent algorithm, there are two problems to be considered: the compute the user’s public key h = g a mod N 2 , where the user’s
problem of the speed of the convergence to a local minimum and private key is sk = a. Finally the algorithm outputs (pk, sk).
the problem in the error surface, there are many local minima error • (A, B) ← Enc (pp,pk) (m): given a message m ∈ ZN and pick a
does not mean that the global minimum error is found. Hence, random element r ∈ ZN 2 , and outputs the ciphertext (A, B),
to save the computation per weight update step, a good idea for where A = g r mod N 2 and B = hr (1 + mN ) mod N 2 .
optimizing the algorithm is to use some training samples at a • m ← Dec (pp,sk) (A, B): given a ciphertext (A, B) and private key
time, i.e., the weights are updated upon examining each individual sk = a, it returns the message m as
training example. This optimization algorithm is called stochastic B
gradient descent (SGD), which can be seen as an extension of the Aa
− 1 mod N 2
m=
gradient descent algorithm. N
In general, the gradient in stochastic gradient descent is viewed or the special message ‘‘reject’’ if it is invalid ciphertext.
as an expectation. This expectation can be calculated by using • m ← mDec (pp,pk,mk) (A, B): given a ciphertext (A, B), pubic key
a subset of training sample set. In more details, assume that n pk = h and master private key mk, the user’s private key
is the training sample set size, we choose a subset, denoted as (sk = a) can be computed as
X = {X1 , . . . , Xn′ }, which is uniformly sampled from the training
′ ′
sample set. The size n′ is called mini-batch, and is smaller than n. hp q − 1 mod N 2
Usually, n′ is fixed, and independent of n. Algorithm 2 shows the a mod N = · k−1 mod N .
N
detail of the SGD by taking the average gradient on a mini-batch of
In order to remove the random element r ∈ ZN 2 , it is necessary
n examples.
to compute
′ ′
Algorithm 2 Stochastic gradient descent (SGD) Ap q − 1 mod N 2
Input: Learning rate η r mod N = · k−1 mod N
N
Output: Initial weight parameter w
then compute τ = ar mod N. Finally, it returns the message m
while stopping criterion not met do
as
Sample a mini-batch of n′ examples from the training sample p′ q′
set {X1 , . . . , Xn′ } with the corresponding target outputs ti . B
− 1 mod N 2
gτ
for i = 1 to n′ do m= · ζ −1 mod N
Calculate gradient estimate: y ← n1′ ▽w i E (f (Xi ; w); ti );
N
end for or the special message ‘‘reject’’ if it is invalid ciphertext, where
Apply update: w ← wk − η · y k−1 and ζ −1 denote the inverse of k mod N 2 and p′ q′ mod N 2 ,
end while respectively.
Here, learning rate η is a small positive scalar, which is used To simplify the notation, we use Enc pk (m) instead of Enc pp,pk (m)
to moderate the step size in the gradient descent search. E and use Add(·) to denote the addition-gate, i.e., (A, B) := (A1 ·
is the cost function or error function, which is essentially the A2 mod N 2 , B1 · B2 mod N 2 ) = Add(C1 , C2 ), where {Ci }2i=1 =
difference between the target output of the network and output {(Ai , Bi )}2i=1 is a ciphertext set.
of the objective function. Notice that ▽w E (·) is a vector, and its
components are the partial derivatives of E with respect to each of 3.3. Fully homomorphic encryption
the wj , where wj is the component of weight vector w .
Fully homomorphic encryption (FHE) allows one to compute
3.2. Double decryption mechanism the encrypted results of addition and multiplication of two
plaintexts using the corresponding ciphertexts directly without
In a public-key encryption schemes with a double decryption decryption. Generally speaking, a FHE system E F consists of
mechanism, there exists two independent decryption algorithms. four algorithms: key generation algorithm F.KeyGen, encryption
These two decryption algorithms are called user decryption algorithm F.Enc, decryption algorithm F.Dec, and evaluation
algorithm (which has the general private key as input) and master algorithm F.Eval. For this evaluation algorithm F.Eval, given a
entity decryption algorithm (which has the master private key as circuit C , a public key pkF , and any ciphertexts ci is generated
input), respectively. Taking the master private key as input, the by F.EncpkF (mi ), outputs a refreshed ciphertext c ∗ such that
master decryption procedure can decrypt any given ciphertext F.DecskF (c ∗ ) = C (m1 , . . . , mn ).
successfully. We formally define the BCP scheme as follows.
Suppose that circuit DAddF can handle addition gate, denoted
Definition 3.1 (BCP Scheme [7]). There are five algorithms in the by AddF . If c1 and c2 are two ciphertexts that encrypt m1 and m2 ,
BCP scheme, including setup algorithm Setup, key-generation respectively, under pkF , then we can compute
algorithm KeyGen, encryption algorithm Enc, user decryption c ← F.Eval(pkF , DAddF , c1 , c2 )
algorithm Dec and master decryption algorithm mDec, which is
defined as E = {Setup, KeyGen, Enc , Dec , mDec }. which is a ciphertext under pkF of m1 + m2 . Similarly, circuit DMultiF
κ can handle multiplication gate, denoted as MultiF . For ciphertexts
• (pp, mk) ← Setup(1 ): given a security parameter κ , let
c1 and c2 as defined before,
p, q, p′ , q′ be distinct odd primes and p = 2p′ + 1 and q =
2q′ + 1. Set the bit-length as |N | = |pq| = κ . For a multiplication c ← F.Eval(pkF , DMultiF , c1 , c2 )
group Z∗N 2 , the algorithm Setup chooses a random element g ∈
′ ′
is a ciphertext under pkF of m1 × m2 . Assume [m] = F.EncpkF (m),
Z∗N 2 of order pp′ qq′ such that g p q ≡ 1 + kN (mod N 2 ) for then [x] +F [y] = [x + y] = AddF ([x], [y]) and [x] ×F [y] =
k ∈ {1, 2, ·, N − 1}. After this step, the algorithm’s outputs are [xy] = MultiF ([x], [y]) denote the addition and multiplication
public parameter pp = (N , k, g ) and computation of FHE, respectively.
We formally define the multi-key fully homomorphic encryp-
master private key mk = (p′ , q′ ).
tion [4,5] as follows.
80 P. Li et al. / Future Generation Computer Systems 74 (2017) 76–85
Definition 3.2 (Multi-Key Fully Homomorphic Encryption, MK- and set up the target vector t = {ti }ni=1 and weight matrix W (j)
(i) (i) (i)
FHE). For arbitrary circuit class C, a family of encryption schemes in advance, where ti = (t1 , t2 , . . . , tIi ), j = 1, 2.
{E n = (MF.KeyGen, MF.Enc, MF.Dec, MF.Eval)}n>0 is said to be a Because these datasets are highly sensitive, to preserve
multi-key fully homomorphic encryption, if for every n > 0, E n confidentiality of data, each data owner Pi (i ∈ [1, n]) has to
satisfies the following properties: encrypt his/hers data before uploading them to a cloud server.
The cloud server will train a model over these encrypted datasets
• (pkMF , skMF , ekMF ) ← MF.KeyGen(1κ ): given a security which are encrypted under different public keys. To realize a multi-
parameter κ , outputs a public key pkMF , a private key skMF and
key privacy-preserving deep learning system, we consider two
a public evaluation key ekMF .
schemes, i.e., basic scheme and advance scheme. Both schemes
• c ← MF.Enc(pkMF , x): for a message x and public key pkMF , this ensure the inputs, intermediate results generated during the
algorithm outputs a ciphertext c. learning process and final output are secure and no information
• x′ ← MF.Dec(skMF 1 , skMF 2 , . . . , skMF n , c ): given a ciphertext c will be leaked during the whole learning process. In addition, we
and n private keys skMF 1 , . . . , skMF n , this algorithm outputs a assume that each data owner stays online with broadband access
message x′ . to the cloud server.
• c∗ ← MF.Eval((c1 , pkMF 1 , ekMF 1 ), . . . , (cm , pkMF m ,
ekMF m ), C ): taken any boolean circuit C ∈ C, any valid m key 4.2. Adversary model
pairs (pkMF 1 , ekMF 1 ), . . . , (pkMF m , ekMF m ), and any ciphertexts
c1 , . . . , cm as input, this algorithm outputs a refreshed cipher- In this paper, we assume that the data owners and servers are
text c ∗ . honest-but-curious, sometimes called semi-honest. It means that all
The scheme should satisfied the following properties: the cor- the entities (i.e., all the data owners and the servers) will honestly
rectness of decryption and compactness of ciphertexts. That is follow the protocol, and try to gather or discover information
about the intermediate results of the learning process by observing
to say, for any circuit C ∈ C, the support of MF.KeyGen(1κ ): n
the transcripts. And we also assume that all the data owners’
key pairs {(pk′MF t , sk′MF t , ek′MF t )}t ∈[1,n] and its any subset of m key
dataset are sensitive and required to be fully protected against the
pairs {(pkMF i , skMF i , ekMF i )}i∈[1,m] , and any valid ciphertext ci ←
cloud server. Based on this assumption, we consider three kinds
MF.Enc(pkMF i , xi ).
of attacks (1) online attack by compromised active data owners,
(i ∈ [1, m]), the algorithm MF.Eval holds the properties:
who aim to learn or infer sensitive information on the dataset
Correctness of decryption: given a tuple of private key
from other data owners and cloud server; (2) online attack by a
skMF ′1 , . . . , skMF ′n , a refreshed ciphertext c ∗ , the correct decryption
compromised active server, which is the cloud server can pretend
is
as participants execute the learning process for every guess; (3)
MF.Dec(skMF ′1 , . . . , skMF ′n , c ∗ ) = C (x1 , . . . , xm ), outside attack whose goal is to obtain private information from the
data owners and cloud server.
where c ∗ ← MF.Eval((c1 , pkMF 1 , ekMF 1 ), . . . , (cm , pkMF m , In our work, we try to address the privacy-preserving problem
ekMF m ), C ). of deep learning in cloud computing. The security and privacy of
Compactness of ciphertexts: let the data owners’ sensitive data, model’s outputs, and intermediate
results should be protected during the deep learning process.
c ∗ ← MF.Eval((c1 , pkMF 1 , ekMF 1 ), . . . ,
Security goals. There are two aspects of the security goals for data
(cm , pkMF m , ekMF m ), C )
privacy and model privacy. The detailed security goals for these
be a refreshed ciphertext, the size of c ∗ is independent of the two notions are as follows.
parameter m and the size of C , i.e., there is a polynomial f holds • Privacy of Data. The data privacy should be kept secret even
|c ∗ | ≤ f (κ, n), where |c ∗ | is denoted as the size of c ∗ . if a subset of data owners and cloud server are corrupted. In
If n = 1, the Definition 3.2 is the standard definition of FHE more details, no information about the data will be leaked to
scheme. We use AddMF to denote the secure addition gate, which the adversary.
is given the ciphertext c1 and c2 of plaintext m1 and m2 under • Privacy of training model. The training model should be kept
public key pkMF 1 and pkMF 2 respectively, the server calculates the secret and can be only known by data owners. The adversary
sum as c1 +MF c2 = AddMF (c1 , c2 ). Similarly, MultiMF denotes the even with all the information of cloud server, is not able to get
the information of the training model.
secure multiplication gate, the server calculates the products as
c1 ×MF c2 = MultiMF (c1 , c2 ).
5. Multi-key privacy-preserving deep learning system
4. System model
5.1. The basic scheme
4.1. Multi-key privacy-preserving deep learning system In this subsection, we give a basic scheme (refer to Fig. 2) to
realize a scenario that multiple data owners want to collaboratively
In a deep learning system, each data owner has a sensitive learn the parameters W (1) , W (2) with their partitioned data
dataset DBi in which local resources are fully administrated by without leaking the information of their sensitive datasets.
the data owner. In our multi-key system, we consider n such data Main idea. Generally speaking, SMC cannot handle the data
owners, denoted by P1 , . . . , Pn . Each data owner Pi (i ∈ [1, n]) encrypted with different public keys, and it can only deal
has own pair of public and private keys (pki , ski ), local sensitive with the ciphertext under the same public key. In our basic
(i) (i) (i)
dataset DBi has Ii attributes {X1 , X2 , . . . , XIi } and DB1 ∩ DB2 · · ·∩ scheme, to preserve the data privacy when multiple parties
DBn = Φ , where i ∈ [1, n]. These data owners want to perform are involved in deep learning model, we use MK-FHE [4] E n
collaborative deep learning with the other data owners. Due to the to encrypt the data before uploading it to a cloud. Assume
computational ability is limited, data owners need to outsource the there are n data owners P1 , . . . , Pn , who hold their respect
data to an untrusted cloud server for collaboratively deep learning. mutually disjoint, sensitive and vertically partitioned dataset
Before running a neural network, data owners jointly negotiate DB1 , . . . , DBn , and corresponding target vectors t1 , . . . , tn . Each
P. Li et al. / Future Generation Computer Systems 74 (2017) 76–85 81
5: return [y];
data owner Pi (i ∈ [1, n]) encrypts his dataset with MK.Enc and
(j)
uploads the ciphertexts MK.Enc(pkMFi , DBi ), MK.Enc(pkMFi , Wi ),
and MF.Enc(pkMFi , ti ) and its public key pkMFi to an untrusted cloud
server C for secure deep learning.
Data owner Pi (i ∈ [1, n]) uses BCP scheme to encrypt its private Secure back propagation: The stochastic gradient descent (SGD)
(i) Ii (i) (i) Ii
dataset (Enc pki (DBi ) = {Enc pki (Xj )}j= 1 = {(Aj , Bj )}j=1 ) and
over the plaintext domain is described in Algorithm 2,
(1) (2)
weight parameter Enc pki (Wi ), Enc pki (Wi ). δk(3) = o(k3) (1 − o(k3) )(tk − o(k3) ), (4)
After that, data owner Pi (ı ∈ [1, n]) uploads the encrypted
data, the target vector Ti and the public key pki to the cloud server
(2) (2) (2) (3)
C . Here, we assume the communication channel for uploading is δh = ok (1 − ok ) Wkh δk (5)
secure, which means nobody can obtain the uploaded data from C . k∈D
For simplicity, we use (Ai , Bi ) to denote BCP ciphertext of the DBi (3) (2)
for the data owner Pi (i ∈ [1, n]). where δk and δh are the error terms for each network output
unit k and for each hidden unit h, respectively. D denotes all of
Training. In this phase, we describe how the cloud server C han-
units whose immediate inputs including the output of unit h.
dles α request from data owners in each learning round. After re-
Finally, update each network weight W (1) , W (2)
ceiving the data encrypted with different public keys, cloud C needs
to perform an α -input deep learning algorithm over encrypted do- (3)
W (2) := W (2) + ηδk xkh (6)
main {(Enc pk1 (DB1 ), Enc pk1 (W1(k) ), T1 ), . . . , (Enc pkn (DBn ),
(k) (1) (1) (2)
Enc pkn (Wn ), Tn )}, where k = 1, 2; α = + ηδh xhi .
n
s=1 Is .
W := W (7)
However, the learning algorithm cannot process the ciphertext
From the Eqs. (4) and (5) over plaintext domain, the ci-
under the same public keys directly. Therefore, cloud server C first
phertext can be securely computed by AddF and MultiF as
runs Algorithm 5 with authorized center AU to transform the
ciphertexts under different public keys into ciphertext under the [δk(3) ] = [o(k3) ] ×F ([1] +F [o(k3) ]) ×F ([tk ] +F [o(k3) ]) and [δh(2) ] =
[o(k2) ] ×F ([1] +F [o(k2) ]) ×F ( k∈D [Wkh ] ×F [δk(3) ]) respectively. Since
same public key. Since AU holds the master key mk, it can decrypt
any given valid ciphertext by using the master decryption of BCP (3) (2)
the ciphertexts [xhi ], [xkh ] and [η] are known, [δk ] and [δh ] are
scheme. Hence, C needs to blind ciphertexts {(Ai , Bi )}ni=1 with a securely computed, the Eqs. (6) and (7) can be performed over en-
I
random message {ri }ni=1 (ri ∈ ZNi ) before sending the ciphertexts crypted domain.
to AU. After receiving the blinded ciphertexts {(A′i , B′i )}ni=1 , AU
decrypts the blinded ciphertexts and re-encrypts the blinded Algorithm 6 Transformation of the FHE ciphertext under pkF to
plaintext zi with FHE F.EncpkF (zi ) of E F , which is denoted by Zi , and BCP ciphertext under different public keys
sends this new ciphertext Zi to C . By removing the blinding factor
Input: A ciphertext C under pkF and pk1 , pk2 , · · · , pkn
ri , C can get the ciphertext under the public key pkF of F.Enc(·)
Output: {(Ai , Bi )}ni=1
without knowing the underlying plaintext.
1: Cloud server C does:
[o(3) ] = f ([net (3) ]) = f ([W (2) ] ×F [o(2) ]). E ,A (k) is defined as follows:
periment ExpSS
P. Li et al. / Future Generation Computer Systems 74 (2017) 76–85 83
E ,A (k):
ExpSS 7. Application
b ← {0, 1}
(pk, sk) ← KeyGen(1k ) In this section, we show an application of our advanced scheme
(m0 , m1 ) ← A1 (pk) in face recognition.
c ← Enc pk (mb ) Privacy-Preserving Face Recognition. As a typical biometric
b′ ← A2 (c ) authentication technique, face recognition is increasingly applied
If b′ = b, return 1; in real life. The widespread use of this technique arouse people’s
else, return 0 many privacy concerns, especially outsourcing computing in an
untrusted cloud server.
Here, we require that the two plaintexts have the same length. Assume there exists an image sample set, which collect n × m
If they are not, padding message can be applied. grayscale images, n (e.g. n = 20) person P1 , . . . , Pn in various
Privacy of Data. To protect the data privacy of the data owners, poses, and each person has m (e.g. m = 30) images (each image
in basic scheme and advanced scheme, we use MK-FHE and pixel p1 × p2 , p1 = p2 = constant, n, m ≪ p1 , p2 and p1 , p2 ∈
hybrid encryption scheme to support the secure computation, (0, 255)). These m images include person’s expression (such as
respectively. According to the definition of semantic security, we sad, dismay, happy, angry), which direction they are watching
can obtain the following conclusions. (e.g. straight, right, left, up), and whether they are wearing mask
or not. The main task is to learn a target function from this image
Corollary 6.2 (MK-FHE Semantic Security). If the underlying sample set securely. Suppose we choose a network with a (α −
encryption scheme is semantically secure, then the multi-key fully β − γ ) configuration, i.e., one input layer with α nodes, one hidden
homomorphic encryption is semantically secure. layer with β nodes and one output layer with γ nodes. The system
includes three phases, including input encoding, learned hidden
Proof (Sketch). Let us assume the public key encryption scheme representation, and output encoding. The details are described as
E = {KeyGen, Enc , Dec } is semantically secure. Based on follows.
this scheme E , the challenger constructs a evaluate algorithm Input encoding . Before uploading the image to cloud server C
Eval, such that the new public key encryption scheme E ′ = for collaborative deep learning, P1 , . . . , Pn should preprocess the
{KeyGen, Enc , Dec , Eval} keeps homomorphic of addition and image data to feature extraction, then execute deep learning
multiplication operations. If the evaluation key ek is public, then network with these features (e.g. edges) as input. Using this
the adversary can compute Eval directly according to the public key feature as input can reduce the number of input and corresponding
pk, the ciphertext c and the evaluation key ek. Therefore, the MK- weights, and cut down the computation and keep the classification
FHE scheme is semantically secure. correctly. Since the deep learning network has a fixed number of
input units, then encoding the image has a fixed pixel intensity
Recall that in our basic scheme, data owners do not communi- values, i.e., n × m, which can be seen as the abstract expression
cate with each other until the decryption phase. Each data owner of the original p1 × p2 image. For example, Pi (i ∈ [1, n]) has
Pi (i ∈ [1, n]) generates its own key tuple (pkMFi , skMFi , ekMFi ) and m original images Γ1 , . . . , Γm , each image Γi (i ∈ [1, m]) with
encrypts its input DBi under the public key pkMFi of MK-FHE. A p1 × p2 pixel. After feature extraction, the original image is replaced
semi-honest data owner P may collude with some data owners, and by a feature set of n × m pixel, while each pixel value as one
wants to reveal a sample vector DB uploaded by other data owners.
network input. However, in order to efficiently compute network
However, data owners do not need to coin-flip for each other’s ran-
of hidden and output activation functions, the input feature sample
dom coins. From the Definition 3.2, it guarantees our basic scheme
range space (0, 255) should be converted to range space (0, 1),
is secure against corrupt data owners. Therefore, the privacy of the
which is represented as (A1 , . . . , As ), where Ai ∈ (0, 1)n×m , s <
data owners is confidential.
n. Finally, each Pi (i ∈ [1, n]) runs Data Uploading protocol
In our advanced scheme, because the BCP scheme and FHE are (which is described in Section 5.2), and uploads the ciphertexts
semantically secure (the detailed proof [7], Theorem 11), the cloud Enc pki (Ai ), Enc pki (W (1) ), Enc pki (W (2) ) to the cloud server C .
server C should be probabilistic polynomially bounded, and sends
the blinded ciphertexts to AU for computation. The computing Learned Hidden Representations. Right now, cloud server C has
power of AU, on the other hand, dose not have to be bounded, obtained the data encrypted with different public keys, so cloud
since it only receives the blinded ciphertexts and only be able to see server C and authorized center AU run Training protocol (which
the blinded messages, which are decrypted by master decryption is described in Section 5.2) to training a neural network securely.
algorithm of BCP scheme. Hence, the cloud server C and authorized Out encoding . Let us assume the deep learning network has four
center AU cannot obtain the learning results. Therefore, the output units, and each output unit represents one of the four face
privacy of the learning results is confidential and we obtain the directions. These four outputs can be viewed as a four-dimension
following lemma. vector, i.e., (straight , right , left , up) and we use four real numbers
to represent the possibility of directions. If some component value
Lemma 6.3. Without any collusion, Algorithm 5, 6 is privacy- is the highest, then the corresponding direction can be seen as the
preserving for the weight W (1) , W (2) . deep learning network prediction. For instance, (0.3, 0.3, 0.3, 0.7)
is target output vector and 0.7 is the highest value, this result
Privacy of Training Model. A semi-honest cloud server C can train indicates the person is looking at his up. Since all the computation
a deep learning model privately. Because AddF and MultiF are is performed in encrypted domain, the output results are also
both semantically secure, there is no information leakage for C . encrypted. After comparing these four components of the output
Hence, for the weights W (1) , W (2) in the training process of In vector over the encrypted domain, cloud server C chooses the
the feed work stage and In the back propagation stage, cloud server highest one. Once the network output is given, C runs Algorithm
C performs AddF and MultiF operations, therefore privacy for the 6 and sends the result encrypted with pki to person Pi , where
whole training process is guaranteed. i ∈ [1, n].
84 P. Li et al. / Future Generation Computer Systems 74 (2017) 76–85
8. Conclusions and future work [21] C.C. Aggarwal, S.Y. Philip, A general survey of privacy-preserving data mining
models and algorithms, in: Privacy-Preserving Data Mining, Springer, US,
2008, pp. 11–52.
In this paper, we focused on the privacy issues of collaborative
[22] S.Q. Ren, B.H.M. Tan, S. Sundaram, et al., Secure searching on cloud storage
deep learning in cloud computing, and proposed two schemes, enhanced by homomorphic indexing, Future Gener. Comput. Syst. (2016).
i.e., basic scheme and advanced scheme, to protect the privacy in [23] A. Evfimievski, T. Grandison, Privacy Preserving Data Mining, IGI Global, 2009,
deep learning. The basic scheme is based on a Mk-FHE scheme, pp. 1–8.
and the advanced scheme is based on a hybrid structure, which [24] P. Vijayakumar, V. Chang, L.J. Deborah, et al., Computationally efficient
combines the double decryption mechanism with FHE scheme. privacy preserving anonymous mutual and batch authentication schemes for
vehicular ad hoc networks, Future Gener. Comput. Syst. (2016).
Both schemes are able to tackle the problem of privacy-preserving
[25] W. Du, Y. Han, S. Chen, Privacy-preserving multivariate statistical analysis:
collaborative deep learning ciphertext with different public keys. Linear regression and classification, in: SDM, 2004, Vol. 4, pp. 222–233.
Compared with the basic scheme, the advanced scheme does not [26] G. Jagannathan, R.N. Wright, Privacy-preserving distributed k-means cluster-
need the interaction among the data owners during the decryption ing over arbitrarily partitioned data, in: Proceedings of the Eleventh ACM
of the learning result. Our future work will be focused on two SIGKDD International Conference on Knowledge Discovery in Data Mining,
ACM, 2005, pp. 593–599.
open problems: how to implement the FHE scheme in practical
[27] T. Graepel, K. Lauter, M. Naehrig, ML confidential: Machine learning on
machine learning, and how to reduce the cost of computation and encrypted data, in: Information Security and Cryptology, (ICISC), 2012,
communication. pp. 1–21.
[28] D. Agrawal, R. Srikant, Privacy-preserving data mining, in: Proc. ACM Conf.
Manage. Data, 2000, pp. 439–450.
Acknowledgments
[29] N. Li, M. Lyu, D. Su, et al., Differential privacy: From theory to practice, Synth.
Lect. Inf. Secur. Privacy Trust 8 (4) (2016) 1–138.
This work was supported by National Natural Science Foun- [30] T. Zhang, Q. Zhu, Dynamic differential privacy for ADMM-based distributed
dation of China (No. 61472091), Natural Science Foundation of classification learning, IEEE Trans. Inf. Forensics Secur. 12 (1) (2017).
Guangdong Province for Distinguished Young Scholars [31] M. Abadi, A. Chu, I. Goodfellow, et al., Deep learning with differential privacy,
(2014A030306020), Science and Technology Planning Project of in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and
Communications Security, ACM, 2016, pp. 308–318.
Guangdong Province, China (2015B010129015) and the Innovation
[32] R. Shokri, V. Shmatikov, Privacy-preserving deep learning, in: Proceedings of
Team Project of Guangdong Universities (No. 2015KCXTD014). the 22nd ACM SIGSAC Conference on Computer and Communications Security,
ACM, 2015, pp. 1310–1321.
References [33] T.T. Chen, S. Zhong, Privacy-preserving back-propagation neural network
learning, IEEE Trans. Neural Netw. 20 (10) (2009) 1554–1564.
[1] V. Chang, Towards a big data system disaster recovery in a private cloud, Ad [34] A. Bansal, T. Chen, S. Zhong, Privacy preserving back-propagation neural
Hoc Networks 35 (2015) 65–82. network learning over arbitrarily partitioned data, Neural Comput. Appl. 20
[2] Z.W. Wang, C. Cao, N.H. Yang, V. Chang, ABE with improved auxiliary input for (1) (2011) 143–150.
big data security, J. Comput. System Sci. (2016). [35] N. Schlitter, A protocol for privacy preserving neural network learning on
[3] O. Goldreich, Secure multi-party computation. Manuscript. Preliminary horizontally partitioned data, in: PSD, 2008.
version, 1998, pp. 86–97. [36] J.W. Yuan, S.C. Yu, Privacy preserving back-propagation neural network
[4] A. López-Alt, E. Tromer, V. Vaikuntanathan, On-the-fly multiparty computa- learning made practical with cloud computing, IEEE Trans. Parallel Distrib.
tion on the cloud via multikey fully homomorphic encryption, in: Proceedings Syst. 25 (1) (2015) 212–221.
of the Forty-Fourth Annual ACM Symposium on Theory of Computing, ACM,
[37] D. Boneh, E.J. Goh, K. Nissim, Evaluating 2-dnf formulas on ciphertexts,
2012, pp. 1219–1234.
in: Proceedings of the Second International Conference on Theory of
[5] P. Mukherjee, D. Wichs, Two round multiparty computation via multi-key
Cryptography, TCC05, Berlin, Heidelberg, 2005, pp. 325–341.
FHE, in: Annual International Conference on the Theory and Applications of
Cryptographic Techniques, Springer Berlin, Heidelberg, 2016, pp. 735–763. [38] Q. Zhang, L.T. Yang, Z. Chen, Privacy preserving deep computation model
[6] P. Mukherjee, P, D. Wichs, Two Round MPC from LWE via Multi-Key FHE. IACR on cloud for big data feature learning, IEEE Trans. Comput. 65 (5) (2016)
Cryptology ePrint Archive, 2015, p. 345. 1351–1362.
[7] E. Bresson, D. Catalano, D. Pointcheval, A simple public-key cryptosystem with [39] Z. Brakerski, C. Gentry, V. Vaikuntanathan, (Leveled) fully homomorphic
a double trapdoor decryption mechanism and its applications, in: Advances in encryption without bootstrapping, in: Proceedings of the 3rd Innovations in
Cryptology-ASIACRYPT 2003, 2003, pp. 37–54. Theoretical Computer Science Conference, ACM, 2012, pp. 309–325.
[8] C. Gentry, Fully homomorphic encryption using ideal lattices, in: Symposium [40] A. Mirhoseini, A.R. Sadeghi, F. Koushanfar, CryptoML: Secure outsourcing of
on the Theory of Computing, 2009. big data machine learning applications. 2016.
[9] T.H. Chan, K. Jia, S. Gao, J. Liu, et al., PCANet: A simple deep learning baseline
[41] S. Goldwasser, S. Micali, Probabilistic encryption, J. Comput. System Sci. 28 (2)
for image classification? IEEE Trans. Image Process. 24 (12) (2015) 5017–5032.
(1984) 270–299.
[10] A. Graves, A.R. Mohamed, G. Hinton, Speech recognition with deep recurrent
neural networks. in: ICASSP, 2013.
[11] G. Hinton, L. Deng, D. Yu, et al., Deep neural networks for acoustic modeling in
speech recognition: The shared views of four research groups, Signal Process.
Mag. 29 (6) (2012) 82–97. Ping Li received the M.S. and Ph.D. degree in mathemat-
[12] M. Liang, Z. Li, T. Chen, J. Zeng, Integrative data analysis of multi-platform ics from Sun Yat-sen University in 2010 and 2016, re-
cancer data with a multimodal deep learning approach, IEEE/ACM Trans. spectively. Currently, she works at Guangzhou University
Comput. Biol. Bioinf. (TCBB) 12 (4) (2015) 928–937. as postdoctoral. And hers main research interest include
[13] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with cryptography, privacy-preserving and cloud computing.
neural networks, Science 313 (5786) (2006) 504–507.
[14] V. Chang, M. Ramachandran, Towards achieving data security with the cloud
computing adoption framework, IEEE Trans. Serv. Comput. 9 (1) (2016)
138–151.
[15] V. Chang, Y.H. Kuo, M. Ramachandran, Cloud computing adoption framework:
A security framework for business clouds, Future Gener. Comput. Syst. 57
(2016) 24–41. Jin Li received the B.S. degree in mathematics from
[16] G. Sun, Y. Xie, D. Liao, et al., User-defined privacy location-sharing system in Southwest University in 2002 and the Ph.D. degree in
mobile online social networks, J. Netw. Comput. Appl. (2016). information security from Sun Yat-sen University in
[17] G. Sun, D. Liao, H. Li, et al., L2P2: A location-label based approach for privacy 2007. Currently, he works at Guangzhou University as
preserving in LBS, Future Gener. Comput. Syst. (2016). a professor. He has been selected as one of science
[18] J. Li, X.F. Chen, M.Q. Li, et al., Secure deduplication with efficient and reliable and technology new star in Guangdong province. His
convergent key management, IEEE Trans. Parallel Distrib. Syst. 25 (6) (2014) research interests include applied cryptography and
1615–1625. security in cloud computing. He has published more than
[19] J. Li, Y.K. Li, X.F. Chen, et al., A hybrid cloud approach for secure authorized 70 research papers in refereed international conferences
deduplication, IEEE Trans. Parallel Distrib. Syst. 26 (5) (2015) 1206–1216. and journals and has served as the program chair
[20] J. Li, X.F. Chen, X.Y. Huang, et al., Secure distributed deduplication systems with or program committee member in many international
improved reliability, IEEE Trans. Comput. 64 (12) (2015) 3569–3579. conferences.
P. Li et al. / Future Generation Computer Systems 74 (2017) 76–85 85
Zhengan Huang received his B.S. and M.S. degrees Siu-Ming Yiu received a B.S. in Computer Science from the
from Department of Mathematics, Sun Yat-sen University Chinese University of Hong Kong, a M.S. in Computer and
in 2009 and 2011, respectively, and his Ph.D. degree Information Science from Temple University, and a Ph.D.
from Department of Computer Science and Engineering, in Computer Science from The University of Hong Kong.
Shanghai Jiao Tong University in 2015. He served as a Currently, he is a associate professor of the University of
security engineer in Huawei Technologies Co. Ltd. from Hong Kong. His research interest include bioinformatics,
2015 to 2016. Currently, he is a postdoctoral researcher computer security and cryptography.
in Guangzhou University. His research interests include
public-key cryptography and information security.