Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

ML Opp

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

SPECIAL SECTION ON SECURITY AND PRIVACY IN EMERGING DECENTRALIZED

COMMUNICATION ENVIRONMENTS

Received September 12, 2019, accepted September 30, 2019, date of publication October 8, 2019,
date of current version October 21, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2946202

An Efficient Outsourced Privacy Preserving


Machine Learning Scheme With
Public Verifiability
ALZUBAIR HASSAN 1,2 , RAFIK HAMZA 1,2 , HONGYANG YAN1,2 , AND PING LI3
1 School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou 510006, China
2 Peng Cheng Laboratory, Shenzhen 518055, China
3 South China Normal University, Guangzhou 510631, China

Corresponding authors: Alzubair Hassan (alzubairuofk@gmail.com) and Hongyang Yan (hyang.yan@foxmail.com)


This work was supported in part by the National Natural Science Foundation of China under Grant 61902081, Grant 61802078, and Grant
61702126, and in part by the China Postdoctoral Science Foundation under Grant 204728.

ABSTRACT Cloud computing has been widely applied in numerous applications for storage and data
analytics tasks. However, cloud servers engaged through a third party cannot be fully trusted by multiple data
users. Thus, security and privacy concerns become the main obstructions to use machine learning services,
especially with multiple data providers. Additionally, some recent outsourcing machine learning schemes
have been proposed in order to preserve the privacy of data providers. Yet, these schemes cannot satisfy the
property of public verifiability. In this paper, we present an efficient privacy-preserving machine learning
scheme for multiple data providers. The proposed scheme allows all participants in the system model to
publicly verify the correctness of the encrypted data. Furthermore, a unidirectional proxy re-encryption
(UPRE) scheme is employed to reduce the high computational costs along with multiple data providers. The
cloud server embeds noise in the encrypted data, allowing the analytics to apply machine learning techniques
and preserve the privacy of data providers’ information. The results and experiments tests demonstrate that
the proposed scheme has the ability to reduce computational costs and communication overheads.

INDEX TERMS Cloud computing, machine learning, public verifiability, proxy re-encryption, differential
privacy.

I. INTRODUCTION servers is one of the main issues with cloud systems. Mainly,
Cloud computing, with its high data processing capabilities, the existing schemes falling into two categories: private veri-
is important to all applications that require high process- fiability and public verifiability. The private verifiability can
ing costs such as data processing machine learning [1]. deliver higher system efficiency, while the public verifiability
Nonetheless, it is not appropriate to trust a third-party- allows anyone, not just the data providers, to challenge the
based cloud system, especially with storing sensitive data. cloud server for the data correctness and without holding
Cloud computing suffers from several security issues that private information. Cloud systems attempt to employ dis-
represent highly debated topics. Cloud computing provides tinct security techniques. Yet, most of these systems cannot
accessible computing services using on-demand, elastic, guarantee either users’ privacy or data confidentiality without
and easy-to-use techniques. Indeed, cloud computing pro- using multi-layer cryptography techniques [5]–[7]. In this
vides many resources but also possesses crucial security case, encryption is the primary technique used to ensure data
issues. Third-party storage introduces different potential security, where data are encrypted and then stored in the
risks, especially concerning data security [2]. cloud [8], [9]. Encrypted data exploitation is also extremely
The data storage paradigm in the cloud brings several chal- difficult amidst the high complexity.
lenges and issues which have a huge influence on the security Homomorphic encryption displays a promising role in
of the system [3], [4]. Data integrity verification at untrusted cloud computing, developing the privacy of data providers.
Homomorphic encryption gives a way to achieve several
The associate editor coordinating the review of this manuscript and services on encrypted data and improves cloud users’ pri-
approving it for publication was Zheli Liu. vacy. Accordingly, the companies store encrypted data in a

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
146322 VOLUME 7, 2019
A. Hassan et al.: Efficient Outsourced Privacy Preserving Machine Learning Scheme With Public Verifiability

public cloud as well as perform analytic services of the cloud his predictive analytics. All the components of the proposed
provider’s on the encrypted data. Homomorphic encryption framework can publicly check the correctness of ciphertexts
has properties that make these encrypted data useful for before performing any operation which ensures public veri-
companies such as random self-reducibility, re-randomize fiability. The proposed framework guarantees a high level of
encryption, and verifiable encryption [10]. Nevertheless, security and preserves the privacy of users.
the homomorphic encryption still suffers from the high com- The main contributions are highlighted as follows.
putational costs. Researchers presented several contributions • This paper proposes an efficient privacy-preserving
such as fully homomorphic encryption [11], [12], Partially machine learning scheme with public verifiability.
Homomorphic Encryption (PHE) [13], [14], and Somewhat • The unidirectional proxy re-encryption allows different
Homomorphic Encryption (SWHE) [15], [16]. Herein, dif- data providers to delegate their data using the same
ferential privacy is employed to guarantee the privacy of the public key.
users’ data in the cloud. Differential privacy encourages the • All the participating parties in our scheme can check
companies to collect and share aggregate information regard- the validity of the ciphertext before any operations are
ing the customers, at the same time they can maintain the performed. This feature can reduce the time that checks
privacy of their customers. Differential privacy displays with for invalid ciphertext.
a probabilistic form, where the differential privacy algorithm • This scheme can protect the privacy of the providers’
outputs a distribution that changes little in the dataset and data in the cloud and of the data analyst.
does not affect the privacy of an individual’s data [17]. • This scheme uses -differential privacy (-DP), which
The foremost issue with cloud systems concerns the secu- improves the accuracy of applying machine learning
rity and data privacy stored in the cloud hosting system. Most techniques.
companies question their data security, and how they can This paper is presented as follows. The related works
trust storing their sensitive data through third-party services are discussed in Section II. The preliminaries are given
outside their off-line databases [18]. To this end, cryptogra- in Section III. The proposed scheme is explained in
phy techniques have been proposed using various approaches Section IV, while the result and discussion are introduced in
that guarantee data security and users’ privacy. At the user Section V. The security analysis of our protocol is discussed
side, data encryption should be sufficient and considered as a in Section VI. The Finally, the conclusions are given in
standard form of defense that provides a high level of security Section VII.
in the cloud system [19].
To overcome the above issues, it is important to propose II. RELATED WORK
an effective privacy scheme based on machine learning given Most companies currently believe that machine learning will
multiple data providers. Any proposed solution should reduce be a key customer expectation [24]. In this regard, machine
the cost of implementation and maintain the privacy of the learning performs an important role in technology generally,
participants [20]–[22]. Furthermore, it is important to address especially upon cloud computing. The extensive develop-
multiple data providers’ problems. For example, Li et al. [23] ments of the machine learning community have reduced the
proposed a privacy-preserving machine learning framework network overhead. However, these developments affect the
dealing with multiple data providers. However, this solu- computational cost, as most applications have high computa-
tion came with a high computational cost due to the depen- tional costs [1], [25]. Many machine learning techniques have
dence on integer factorization in their proposed framework. been adopted to automatically employ complex mathematical
Additionally, none of the participating components in their computations, thereby suffering high computational costs.
proposed model can publicly verify the correctness of the Recently, machine learning over encrypted data has
outsourced data. This issue increases the overhead for all become an important topic in industry and academy. Var-
parties. Furthermore, the analyst should start the transaction ious approaches have been introduced to overcome these
with data providers through the cloud system and be online challenges such as Partial traditional homomorphic encryp-
during communication. tion schemes. Unfortunately, these approaches are ineffi-
This paper aims to overcome the mentioned above issues cient because they suffer from high computational costs,
by proposing an efficient privacy-preserving machine learn- high network overhead and certain security issues. Several
ing scheme for multi-providers data in the cloud sys- protocols have been proposed by researchers such as [19],
tem. We propose a privacy-preserving framework using the [26]–[28]. For instance, Chen and Zhong [29] introduced
additive homomorphic encryption scheme. First, the data a two-party distributed algorithm to preserve privacy for
providers encrypt their sensitive data using the unidirec- back-propagation neural networks (BPNNs). Their scheme
tional proxy re-encryption scheme and uploaded the cipher- allows the two parties to train their data while ensuring
texts to the proxy server cloud. Then, the cloud re-encrypt that the data are secure. They only considered training
the received ciphertexts with a generated noise-data using datasets that are vertically partitioned. To improve on pre-
partially homomorphic encryption of the Hashed-ElGamal vious work, Bansal et al. [30] proposed another algorithm
scheme. Finally, the cloud will send the noisy-ciphertext to that can be applied when the dataset is arbitrarily partitioned.
the analytic to perform the machine learning techniques for Both Chen and Zhong [29] and Bansalet al. [30] used the

VOLUME 7, 2019 146323


A. Hassan et al.: Efficient Outsourced Privacy Preserving Machine Learning Scheme With Public Verifiability

homomorphic properties to protect the privacy of the two B. DECISION DIFFIE-HELLMAN (DDH)
parties. Let assume that we have two distributions A = (gx , gy , gxy )
The aforementioned works do not work properly in multi- and B = (gx , gy , gz ) for randomly distributed x, y, z ←− Zq .
party environments because of the high communication over- Distinguish A from B [39].
head. Samet and Miri [31] presented a protocol to protect the
privacy of both the input data and the created learning model C. DIVISIBLE COMPUTATIONAL DIFFIE-HELLMAN (DCDH)
in a BPNN and extreme learning machine. Their protocol can In this part, we assume that we have g, ga , gb ∈ G, where
be applied when the dataset is vertically or horizontally par- ∀{a, b} ∈ Z∗q , then gb/a ∈ G cannot be computed due to its
titioned. Graepel et al. [32] proposed a scheme to ensure the difficulty. The DCDH and CDH are equivalent in the same
confidentiality of the encrypted data during machine learning group [40].
over these data in the training and test phases. Their algorithm
can be applied to two types of classification algorithms: linear D. UNIDIRECTIONAL PROXY RE-ENCRYPTION (UPRE)
means and Fisher’s linear discriminant [32]. Liu et al. [33] SCHEME
worked on preserving the users’ privacy in social networks
The definition of the unidirectional PRE scheme [41] is com-
and keeping their sensitive information secure, considering
posed of six algorithms, described as follows:
the identity disclosure problem in weighted social graphs.
• Initialization(k): This algorithm uses a security param-
Nowadays, with the development of cloud computing and
eter k as input. Then, the algorithm returns the public
outsourcing, several techniques have proposed to guarantee
parameters param. Additionally, the message space M
users’ privacy. For example, Wei et al. [34] presented prac-
description is given in this algorithm.
tical outsourcing algorithms for exponentiation computation
• KeyGen(): This algorithm calculates the users’ public
based on homomorphic mapping. Gao et al. [35] worked
key and corresponding private key pair (pkui , skui ).
to compute the social contiguity between users to identify
• ReKeyGen(skui , pka ): This algorithm uses the private
potential friends and keep their privacy based on a proxy re-
key of the delegator skui and the public key of the
encryption scheme with additive homomorphism.
delegate pka . Then, the algorithm returns a re-encryption
The notation of differential privacy approach has been
key rkui →a .
introduced to overcome privacy and accuracy issues [36].
• Enc(pkui , m): This algorithm uses the public key pkui ,
Dwork and Roth [37] proposed the -DP approach, which
the delegator and the message m ∈ M. Then, the algo-
considers how much data can be revealed. Consequently,
rithm returns a ciphertext Ci under pkui .
researchers have introduced various approaches based on the
• ReEnc(rkui →a , Ci , pkui , pka ): This algorithm uses rkui →a
formal definition of DP, for instance, computational differen-
as input to Ci . Then, the algorithm returns a ciphertext
tial privacy [36] as well as the differential privacy consensus
Ca under the public key pka .
algorithm [38], as discussed herein. Recently, Li et al. [23]
• Dec(sk, C): This algorithm uses sk and C. Then, if the
proposed a privacy-preserving machine learning framework
ciphertext is valid, the algorithm returns the plain mes-
that is suitable to work with multiple data providers. They
sage m ∈ M; otherwise, the algorithm throws an error.
employed a double decryption public-key encryption scheme
This algorithm uses for the delegator User Dec() and the
to ensure security and user privacy. However, their framework
delegate Analyst Dec().
has high computational costs and communication overhead.
Additionally, the framework consuming time on the invalid
ciphertext. E. PARTIAL HOMOMORPHIC HASHED- ELGAMAL
ENCRYPTION SCHEME
III. PRELIMINARIES Hash-ElGamal is partially homomorphic using only xor oper-
In this part, we illustrate some related notations and con- ator. Assume that the encryption of message m is known, then,
0
cepts. First, the computational Diffie-Hellman assumption is anyone can compute the encryption of a message m = m⊕K
described below. We also introduce the divisible computa- for any selected value K ∈ {0, 1}. Lets us assume that we
tional Diffie-Hellman assumption, which concerns the diffi- have C = (c1 , c2 ) with c1 = gr , and c2 = m ⊕ h(yr ).
0 0 0
culty of measuring the discrete logarithm in cyclic groups. Then, we can have C = (c1 , c2 ) with c2 = K ⊕ c2 .
0
Then, we present unidirectional proxy re-encryption, which is This is the hash-ElGamal encryption of m . The partially
primary for the proposed protocol implementation. We illus- homomorphic based on Hash-ElGamal encryption supports
trate the partial homomorphic hashed- ElGamal encryption adding a mask on the ciphertext. Furthermore, this encryption
scheme for predictive analytics. Finally, we present a brief is known to be one-way under the computational Diffie-
introduction to Differential Privacy Hellman assumption, and indistinguishability holds under the
DDH assumption [39].
A. COMPUTATIONAL DIFFIE-HELLMAN (CDH)
Let assume that we have g, ga , gb ∈ G, where ∀{a, b} ∈ Z∗q F. DIFFERENTIAL PRIVACY
and G is a cyclic multiplicative group with prime order q, then Differential privacy is a probabilistic mechanism, where the
gab ∈ G cannot be computed due to its difficulty. algorithm outputs a distribution that changes little in the

146324 VOLUME 7, 2019


A. Hassan et al.: Efficient Outsourced Privacy Preserving Machine Learning Scheme With Public Verifiability

dataset and does not affect the privacy of an individual’s data.


Assuming that d1 and d2 are two data sets; d1 , and d2 can
be neighboring if they are different in only one record [17].
Particularly, DP guarantees the security of the distributed data
by adding measured perturbations to the ciphertext using the
principal of homomorphic encryption [42].
Definition 1 (-DP): A randomized mechanism R is
-differentially private if we have any pair of neighboring
data sets d1 and d2 , and K in Range(R). Then, the following
equation holds:

Pr[R(d1 ) = K ] 6 e Pr[R(d2 ) = K ] (1)

Note that if epsilon can get smaller, implying more strin-


gent privacy. This randomized-based algorithm ensures dif-
ferential privacy during the analytic process. FIGURE 1. System Model of our proposed outsourced privacy-preserving
machine learning scheme.
Definition 2 (Sensitivity): Let f be a function f : d → Rd
in the input space of the dataset. The sensitivity of f represents
two neighboring datasets d1 and d2 , given as follows:
H. SECURITY MODEL
1f = max ||f (d1 ) − f (d2 )||1 (2) Suppose that the data providers ui , PS , and DA are semi-
d1 ,d2 honest but untrusted. Furthermore, it is assumed that there is
no collusion between the parties participating in the system
In this equation, the maximum is over the pairs d1 and d2 in
model. Algorithm B answers the adversary queries according
Rd differing in one element (at most) and k·k1 symbolize the
to our scheme. Adversary A has the following capabilities for
`1 norm.
attacking the plaintext of the users:
Definition 3 (The Laplace Mechanism): The Laplace
mechanism adds noise from Laplace distribution by using the 1) A can collude with ui to obtain the plaintexts of all
probability density function. Where, noise(y) ∝ exp(−|y|/λ), encrypted data downloaded from the cloud.
with 2) A can attack PS to estimate the plaintexts of all cipher-
√ a mean equal to zero and standard deviation equal to text outsourced to the PS by the users ui and all data
2λ. In our work, we use the Laplace mechanism which is
given as follows: sent from DA .
3) A may corrupt some data from ui to produce the
LP (x, f (.), ) = f (x) + (Q1 , . . . Qn ) (3) plaintext from other users’ ciphertexts.

where, we have a function f : N|x| → Rk . Laplace’s IV. THE PROPOSED FRAMEWORK


mechanism relies on adding measured noises to the ciphertext The following subsections explain the proposed frame-
that we want to compute. work structure which contains an efficient privacy-preserving
machine learning scheme with public verifiability.
G. SYSTEM MODEL
The considered system model consists of data providers, A. OVERVIEW OF THE PROPOSED SCHEME
proxy servers and data analysts, as shown in Figure 1. The In the proposed framework, the data providers encrypt their
communication between these components is explained as data sets using the UPRE scheme and uploaded it to the proxy
follows: server cloud. In this step, the data providers could asset their
1) Data providers ui ∈ {u1 , u2 , u3 , . . . , ui } provide the data sets and decide which data are sensitive to be encrypted.
system with data from different sources. ui needs to Then, the cloud re-encrypt the ciphertexts received from the
upload their sensitive data set di after encrypting it to data providers under the public key of the analyst.
the proxy server and delegate the encrypted sensitive Accordingly, the server will add encrypted noised to the
data set [di ] to the analyst. data provider ciphertexts using the partially homomorphic
2) The proxy cloud server (PS ) is responsible for redirect- encryption of the Hashed-ElGamal scheme. After adding the
ing the encrypted data set by the users to the analyst. encrypted noise, the cloud forwards the noise-ciphertext to
PS is a semi-honest cloud with high computational the data analyst. In this step, the data analyst will decrypt
power. the noise ciphertext to get the noise dataset. Then, the ana-
3) Analyst DA receives the encrypted data set and trains lyst performs machine learning algorithms for his predictive
the machine learning model on it. The analyst can analytics. For instance, the analyst could use a k-nearest
perform training on these ciphertexts without compro- neighbor classifier, support vector machine classifier, naive
mising the privacy of the data providers. Bayes classifier, and so on.

VOLUME 7, 2019 146325


A. Hassan et al.: Efficient Outsourced Privacy Preserving Machine Learning Scheme With Public Verifiability

The data analyst will decrypt first the noise ciphertext to get Algorithm 1 Encryption Algorithm
the noise dataset using his private key. Then, the data analyst Input: leftmargin=4mm
chooses and performs the mentioned-above classifiers on the – pka = (pkui ,1 , pkui ,2 ): user’s public key
noised data set using -DP without revealing the privacy of – x i : the message
the individual users. H4 (pku ,2 )
1: Pick β ← Z∗q and compute ϕ = (pkui ,1 i pkui ,2 )β .
In our proposed framework, all the components of the
2: Pick w ← Z∗q and compute ri = H1 (x i , w).
model can publicly check the correctness of ciphertexts H4 (pku ,2 )
before performing any operation which ensures public ver- 3:Compute Ei = (pkui ,1 i pkui ,2 )ri and Fi = H2 (gri ) ⊕
ifiability. This part explains a significant contribution to this (x i ||w).
work because using this feature can reduce the waste time on 4: Compute µi = β + ri .H3 (ϕ, Ei , Fi )modq.
invalid ciphertexts. 5: Output the ciphertext Enc(pkui , x i ) = [xi ] =
(Ei , Fi , ϕ, µi )
B. STRUCTURE OF THE PROPOSED SCHEME Output: [xi ] = (Ei , Fi , ϕ, µi ): the ciphertext
Our protocol consists of the following phases:
• Initialization Phase:
Here, the data providers prepare their public and pri-
they send the encrypted data set [di ], 1fi , i and the re-
vate keys. Then, they compute the re-encryption key to
encryption key rka→b to PS .
use for redirecting their ciphertext to the data analyst.
• Download Phase:
To do that, the data providers need to prepare the pub-
This phase illustrates how the data providers can down-
lic parameters. p and q are selected as primes where
load their ciphertext from the cloud. The data providers
q|p − 1 and the bit-length of q is the security parameter
use the Dec(ski , Ci ) algorithm to download their out-
k. The g uses as a generator of group G, which is a
sourcing encrypted data from PS . The Dec(skui , Ci )
subgroup of Z∗q with order q. Four hash functions H1 :
algorithm takes skui = (ski,1 , ski,2 ) and ciphertext Ci
{0, 1}e0 × {0, 1}e1 → Z∗q , H2 : G → {0, 1}e0 +e1 ,
as input to get back the corresponding data set. Data
H3 : {0, 1}∗ → Z∗q and H4 : G → Z∗q are chosen. The
providers can verify the correctness of Ci using (ϕ, µi ) to
public parameters are (q, G, g, H1 , H2 , H3 , H4 , e0 , e1 ).
receive the valid ciphertext. This step was used for the
The following algorithms represent this phase:
publicly verifiable experiments. Algorithm 2 describes
1) KeyGen(): Pick skui = (sui ,1 ← Z∗q , sui ,2 ← Z∗q ) how the data providers can decrypt their encrypted data
and set pkui = (pkui ,1 , pkui ,2 ) = (gsui ,1 , gsui ,2 ). and apply publicly verifiable data.
2) ReKeyGen(skui , pka ): As input, with the private
key of the users skui = (skui ,1 , skui ,2 ) and the
Algorithm 2 User Decryption Algorithm
public key of the analyst pka = (pka,1 ,
pka,2 ), this algorithm generates the re-encryption Input: leftmargin=4mm
key rkui →a as follows: – ski = (skui ,1 , skui ,2 ): private key
a) Select h ← {0, 1}e0 , π ← {0, 1}e1 and com- – Ci = (Ei , Fi , ϕ, µi ): the ciphertext
H4 (pkui ,2 ) H3 (ϕ,Ei ,Fi ,µi )
pute υ = H1 (h, π) 1: if (pkui ,1 pkui ,2 )µi = ϕ.Ei then
b) Compute V = pka,2 υ and W = H (gυ )⊕(h||π ). 1
2 su ,1 H4 (pku ,2 )+su ,2
c) Define rkui →a = su ,1 H4 (pkhb,2 )+su ,2 . 2: compute (x i ||π ) = Fi ⊕ H2 (Ei i i i
)
H4 (pku ,2 )
if E = (pkui ,1 i pkui ,2 )H1 (x ,w) then
i i i
h1i 3:
d) Return (rkui →a , V , W )
4: return User Dec(skui , Ci )= x i
• Upload phase:
5: else
In this phase, the data providers will upload their
6: return ⊥.
encrypted data to cloud. The data provider’s data set is
7: end if
represented by di = {(x i , yi ) ⊂ X , Y }, where x i ∈ R
8: else
denoted the data vector and yi ∈ Y := {0, 1} denotes the
9: return ⊥.
associated binary label. First, the data providers encrypt
10: end if
their sensitive data x i using the Enc(pkui , x i ) algorithm
Output: x i ∈ M: the message
with their public key pkui = (pkui ,1 , pkui ,2 ). Algorithm 1
details how the users encrypt x i ∈ M.
The final result of the Enc(pkui , x i ) algorithm can also • Re-encryption phase:
be represented as tuple of [x i ] = (Ei , Fi ). Therefore, Here, the cloud uses Re-encryption algorithm and trans-
the data set can be written as Enc(pka , di )= [di ] = fers the generated ciphertext to the Analytic. The cloud
([x i ], [yi ]) = ((Ei , Fi , ϕ, µi ), [yi ]), where ϕ, µi used as receives [di ], 1fi and i from the data providers.
signature to confirm the ciphertext correctness. Second, Algorithm 3 details how PS re-encrypt Ci with mul-
the data providers determine the sensitivity level of their tiple public keys using the public key of the analyst.
query function 1fi and the privacy level i for di . Finally, In this part, PS checks the correctness of the ciphertext

146326 VOLUME 7, 2019


A. Hassan et al.: Efficient Outsourced Privacy Preserving Machine Learning Scheme With Public Verifiability

Algorithm 3 Re-Encryption Algorithm Algorithm 4 Analyst Decryption Algorithm


Input: leftmargin=4mm Input: leftmargin=4mm
h1i
– rkui →a = (rkui →a , V , W ): Re-encryption key. – Ca = (γi , Fi , V , W ): public key
– pkui = (pkui ,1 , pkui ,2 ): Public key of the users. 1
1: Compute (h||π) = W ⊕ H2 (V sui ,2 )
– pka = (pka,1 , pka,2 ): Public key of the analyst. 1
– [x i ]a = (Ei , Fi , ϕ, µi ): ciphertext under user i. 2:Compute (x i ||w) = F ⊕ H2 (γi h )
H2 (h,π)
H4 (pkui ,2 ) H3 (ϕ,Ei ,Fi ) 3: if V = pku ,2 then
1: if (pkui ,1 pkui ,2 )µi = ϕ.Ei then i
i ,w).h
h1i
rkui →a 4: if γi = g H 1 (x then
2: compute γi = Ei 5: Return Analyst Dec(skai , Ca )= x i
3: Return ReEnc(rkui →a , Ci , pkui , pka )= 6: else
[x i ]a (γi , Fi , V , W ) 7: Return ⊥
4: else 8: end if
5: Return ⊥ 9: end if
6: end if Output: x i ∈ M: the message
Output: [x i ]a = (γi , Fi , V , W ): the ciphertext

TABLE 1. Security levels (Bits).

using ϕ, µi . PS uses the ReEnc(rkui →a , Ci , pkui , pka )


algorithm. Second, this algorithm uses rkui →a =
h1i
(rka→b , V , W ) and (Ei , Fi ) under the public key of
the data providers pkui = (pkui ,1 , pkui ,2 ) as inputs.
Then, ReEnc(rkui →a , Ci , pkui , pka ) returns a new cipher-
text [x i ]a under the public key of the analyst pka =
(pka,1 , pka,2 ). Which means that the inputs of Ci with
multiple public keys transfer to another ciphertext with
the public key of analyst pka = (pka,1 , pka,2 ).
Third, the cloud uses the Laplace distribution to gener-
ate a d-dimensional noise vector δ with parameter 1f i
i .
Here, the cloud computes [x i ] ⊕ [δ i ]. Then, the cloud
system encrypts these noise vectors, transforming them
into ciphertexts [δ]. In this step, the cloud uses par-
tial homomorphic hashed- ElGamal encryption [43].
Finally, the cloud sends [di ]0 to the data analyst.
FIGURE 2. Computational costs.
• Learning phase
Here, the analyst obtains the noisy ciphertexts from the
cloud. Algorithm 4 describes this phase in detail. First, To evaluate the computational costs of the proposed
DA checks the validation of [di ]0 using V , W . Second, scheme, the experiment is conducted using Java pairing-
the analyst decrypts Ei0 with the associated public keys based cryptography (JPBC) Library [44]. In this work,
of the data providers. Finally, the data analyst can obtain we employ a personal computer with CPU Intel Core i
a noisy data set and apply his machine learning classifi- 7-3537U dual core (2.00 and 2.50) GHz and RAM 12 GB.
cation algorithms. Additionally, we use the curve y2 = x 3 + x over the field Fq
to obtain type A pairings for q = 3 mod 4. To obtain security
V. RESULT AND DISCUSSION level, the experimental employ 80-bit, 112-bit, and 128-bit
In this section, the implementation of the proposed scheme AES key size security level as shown in Table 1.
is presented with different perspectives in terms of results,
performances, and discussions. The following subsections B. COMPUTATIONAL COST
describe the implementation setup, the experimental results, This part illustrates the computational cost of our proposed
and the discussions. framework. The processing costs of the proposed scheme are
given according to the computational time. Figure 2 shows
A. EXPERIMENT SETUP the computational times of our protocol phases using different
To explain the theoretical analysis of computational complex- security levels 80-bit, 112-bit, and 128-bit, respectively.
ity of our scheme, We denote by Texp the computational cost
of exponentiation in G. Then, we have the Encryption time C. CIPHERTEXTS SIZE
is 3 Texp , re-Encryption time is 2.5 Texp , the user Decryption The proposed scheme computes the ciphertexts with differ-
time is 3.5 Texp , and analyst decryption time is 4 Texp . ent security levels. For instance, we use the security level

VOLUME 7, 2019 146327


A. Hassan et al.: Efficient Outsourced Privacy Preserving Machine Learning Scheme With Public Verifiability

TABLE 2. Comparative study with relevant schemes.

with 80-bit. Then, the elliptic curve with q = 160/8 bytes is ‘‘Fully Homomorphic Encryption’’ by ‘‘FHE’’ and ‘‘Additive
employed. The size of G1 is given as 1024 bits. The size of G1 Homomorphic Encryption’’ by ‘‘AHE’’. The fifth column
can be used with 65 bytes according to the related work [47]. illustrates the use of the differential privacy technique or not
As a result, the size of data providers’ ciphertexts uploaded for each scheme. The sixth column compares the ability to
to the cloud is 3|G| + |Zq | = 3 × 65 + 160/8 = 215 bytes. work with multiple data providers or not. The seventh column
The re-encrypted ciphertexts size will be transformed to the explains whether the delegator in these schemes can re-direct
analyst is 2|G| + 2|Zq | = 2 × 65 + 2 × 160/8 = 170 bytes. the ciphertext to another actor in the system or not. The
eighth column shows the resistance of the schemes against the
D. PERFORMANCE OF THE -DP collusion attack. Finally, the last line explains whether all the
The proposed scheme transforms the encrypted data with components in the system model of these schemes [23], [35],
multiple public keys into noise-ciphertext. Accordingly, the [45], [46] can check the correctness of the data providers’
proposed scheme improves the efficiency and accuracy of ciphertexts publicly or not.
data processing.
The analyst performs the chosen machine learning algo-
rithm on noise- data set with -DP and without revealing any VI. SECURITY ANALYSIS
information about the users. By using partial homomorphic This section introduces the security analysis with the
hashed- ElGamal encryption scheme, the evaluation of the ciphertext scenarios and the adversary relevant information.
performances show excellent results in terms of accuracy In detail, We adopt two security definitions: Adver-
and efficiency according to several related works [23], [43]. sary attacks the original ciphertext. Adversary attacks the
The analyst can use different machine learning such as naive re-encrypted ciphertext.
Bayes classifier, support vector machine classifier, and a Theorem: The UPRE scheme is indistinguishability
k-nearest neighbor classifier, etc. against chosen-ciphertext attacks (IND-PRE-CCA) secure in
the random oracle model, where the CDH assumption is hard
E. PUBLIC VERIFIABILITY to solve in a group G.
The public verifiability is an important property, enabling Assume that there is an adversary A to attack the
a third party system to verify the integrity of the cipher- IND-PRE-CCA security of the proposed scheme. Then, there
text stored in the cloud on behalf of data providers. Hence, exists an algorithm B to solve the CDH problem.
the goal of our work is to guarantee data integrity with public Here, we consider two types of adversaries: the first type
verifiability and availability. of adversary attacks the original ciphertext of the users
In the proposed protocol, all the components of the pro- (encrypted data uploaded to the cloud). We denote this type
posed model can publicly check the correctness or the validity by Aor . The second type of adversary attacks the transformed
of all the ciphertexts in public before doing any operation ciphertext from the proxy cloud to the data analyst. We denote
which is called public verifiability. Using this feature we can this type of adversary by Atr . Therefore, algorithm B answers
reduce the time that consuming for working on the invalid the adversary queries.
ciphertexts. Setup: Assume that B submits the public parameters
(q, G, g, H1 , H2 , H3 , H4 , e0 , e1 ) to A. B simulates the ran-
F. COMPARATIVE STUDY dom oracles of H1 , H2 and H3 with lists {LH1 , LH2 , LH3 },
In this part, we provide a comparative study between our pro- respectively, to avoid collision and ensure consistency. B also
posed scheme and some relevant state-of-art schemes [23], prepares two lists LK for the public and private key and LR for
[35], [45], [46]. Table 2 describes several points in our the re-encryption key. B starts generating the original keys
comparison including the variations and contributions of and corrupted keys.
these schemes. The first column represents the state-of-art Lemma 1: Considering Aor for communicating with B
schemes [23], [35], [45], [46]. The second column shows according to the IND-PRE-CAA game
whether a scheme needs to pre-sharing information or not Phase 1: Aor issues a series of queries, and B answers Aor
to perform the computations between the data providers and as in the proposed scheme.
the cloud. The third column illustrates whether all the actors Challenge: Aor challenges B and returns (pku∗i ,1 , pku∗i ,2 ) as
in the systems should be online or not during the opera- well as two messages of the same length m0 , m1 ∈ {0, 1}e0 .
tions. The fourth column presents the type of homomorphic Then, B responds to Aor with the challenge ciphertext
encryption used for each scheme. In this column, we denote C ∗ = (E ∗ , F ∗ , ϕ ∗ , µ∗ ) contains the instance element

146328 VOLUME 7, 2019


A. Hassan et al.: Efficient Outsourced Privacy Preserving Machine Learning Scheme With Public Verifiability

L ∗ in H1 (mς , w ) = ab and
of the DCDH problem ∗ REFERENCES
H2 (g ) = (mς ||w ) F for ς ∈ {0, 1}.
ab ∗
[1] Q. Liu, Y. Guo, J. Wu, and G. Wang, ‘‘Effective query grouping strategy in
Phase 2: A continues to issue queries as in Phase 1 with clouds,’’ J. Comput. Sci. Technol., vol. 32, no. 6, pp. 1231–1249, Nov. 2017.
[2] K. Popović and Ž. Hocenski, ‘‘Cloud computing security issues and chal-
the restrictions described in the IND-PRE-CCA game. Algo- lenges,’’ in Proc. 33rd Int. Conv. (MIPRO), 2010, pp. 344–349.
rithm B responds to these queries for A as in Phase 1. [3] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, ‘‘Enabling public veri-
0
Guess: Eventually, Aor responds with a guess ς and sends fiability and data dynamics for storage security in cloud computing,’’ in
Proc. Eur. Symp. Res. Comput. Secur. Berlin, Germany: Springer, 2009,
it to B. Finally, B returns the solution of the DCDH instance. pp. 355–370.
Lemma 2: Considering Atr communicating with B accord- [4] Y. Li, S. Yao, K. Yang, Y. Tan, and Q. Zhang, ‘‘A high-imperceptibility and
ing to the IND-PRE-CAA game. histogram-shifting data hiding scheme for JPEG images,’’ IEEE Access,
vol. 7, pp. 73573–73582, 2019.
Phase 1: Atr issues a series of queries, and B answers Atr [5] Q. Lin, H. Yan, Z. Huang, W. Chen, J. Shen, and Y. Tang, ‘‘An id-based
as in the proposed scheme. linearly homomorphic signature scheme and its application in blockchain,’’
Challenge: Atr challenges B and returns (pku∗i ,1 , pku∗i ,2 ) IEEE Access, vol. 6, pp. 20632–20640, 2018.
as well as two messages of the same length m0 , [6] A. D. Josep, R. Katz, A. Konwinski, L. Gunho, D. Patterson, and
A. Rabkin, ‘‘A view of cloud computing,’’ Commun. ACM, vol. 53, no. 4,
m1 ∈ {0, 1}e0 . Then, B responds to Atr with the pp. 50–58, 2010.
challenge ciphertext C ∗ = (E ∗ , F ∗ , V ∗ , W ∗ ) con- [7] J. Xu, L. W. Wei, Y. Zhang, A. D. Wang, F. C. Zhou, and
tains the instance element of the DCDH problem in C.-Z. Gao, ‘‘Dynamic fully homomorphic encryption-based merkle tree
for lightweight streaming authenticated data structures,’’ J. Netw. Comput.
H1 (mς , w∗ ) = r ∗ = (b/a)(t/rki0 →i∗ (xi∗ ,1 H4 (pki0 ,2 ) + xi0 ,2 )) Appl., vol. 107, pp. 113–124, Apr. 2018.
t/rk 0 (x ∗ H (pk 0 )+x 0 ) L
and H2 (ga/b ) i →i∗ i ,1 4 i ,2 i ,2 (mς ||w∗ ) = F ∗ for [8] Z. Yu, C.-Z. Gao, Z. Jing, B. B. Gupta, and Q. Cai, ‘‘A practical public
key encryption scheme based on learning parity with noise,’’ IEEE Access,
ς ∈ {0, 1}. vol. 6, pp. 31918–31923, 2018.
Phase 2: Atr continues to issue queries as in Phase 1 [9] Q. Zhang, Y. Li, Q. Zhang, J. Yuan, R. Wang, Y. Gan, and Y. Tan, ‘‘A self-
with the restrictions described in the IND-PRE-CCA game. certified cross-cluster asymmetric group key agreement for wireless sensor
networks,’’ Chin. J. Electron., vol. 28, no. 2, pp. 280–287, 2019.
Algorithm B responds to these queries for Atr as in Phase 1. [10] S. Halevi, ‘‘Homomorphic encryption,’’ in Tutorials on the Foundations of
0
Guess: Aor responds with a guess ς and sends it to B. Cryptography. Cham, Switzerland: Springer, 2017, pp. 219–276.
Finally, B returns the solution of the DCDH instance. The [11] C. Gentry and D. Boneh, A Fully Homomorphic Encryption Scheme,
vol. 20, no. 9. Stanford, CA, USA: Stanford Univ. Stanford, 2009.
UPRE scheme is IND-PRE-CCA secure in the random oracle [12] D. Boneh, R. Gennaro, S. Goldfeder, A. Jain, S. Kim,
model. P. M. R. Rasmussen, and A. Sahai, ‘‘Threshold cryptosystems from
If A has corrupted DA or PS to get the outsourced data, threshold fully homomorphic encryption,’’ in Proc. Annu. Int. Cryptol.
Conf. Cham, Switzerland: Springer, 2018, pp. 565–596.
then, A cannot get the plaintext due to the IND-PRE-CCA of [13] P. Paillier, ‘‘Public-key cryptosystems based on composite degree residu-
our scheme. Additionally, if A obtains access to some data osity classes,’’ in Proc. Int. Conf. Theory Appl. Cryptograph. Techn. Berlin,
providers and if the re-encryption key cannot provide access Germany: Springer, 1999, pp. 223–238.
[14] W. Ding, Z. Yan, and R. H. Deng, ‘‘Encrypted data processing with
to the plaintext of the data providers, our scheme achieves homomorphic re-encryption,’’ Inf. Sci., vol. 409, pp. 35–55, Oct. 2017.
-DP. Our scheme is secure under the DCDH in the random [15] Y. Ishai and A. Paskin, ‘‘Evaluating branching programs on encrypted
oracle model. data,’’ in Proc. Theory Cryptogr. Conf. Berlin, Germany: Springer, 2007,
pp. 575–594.
[16] J. H. Cheon and J. Kim, ‘‘A hybrid scheme of public-key encryption and
somewhat homomorphic encryption,’’ IEEE Trans. Inf. Forensics Security,
VII. CONCLUSION vol. 10, no. 5, pp. 1052–1063, May 2015.
Cloud computing security is still considered as a major issue, [17] C. Dwork, ‘‘Differential privacy,’’ in Encyclopedia Cryptography Security.
especially with privacy-preserving of the data providers to the Boston, MA, USA: Springer, 2011, pp. 338–340.
[18] Y. Xue, Y.-A. Tan, C. Liang, Y. Li, J. Zheng, and Q. Zhang, ‘‘RootAgency:
third party systems. This paper presents an efficient privacy- A digital signature-based root privilege management agency for cloud
preserving machine learning scheme for multi-providers with terminal devices,’’ Inf. Sci., vol. 444, pp. 36–50, May 2018.
the collaboration of a third party system. In this regard, [19] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and
J. Wernsing, ‘‘Cryptonets: Applying neural networks to encrypted data
the proposed protocol employed a unidirectional proxy re- with high throughput and accuracy,’’ in Proc. Int. Conf. Mach. Learn.,
encryption protocol to protect cloud data sets. All parties 2016, pp. 201–210.
can publicly verify the encrypted data sets, which reduces [20] Y. Tan, Y. Xue, C. Liang, J. Zheng, Q. Zhang, J. Zheng, and Y. Li, ‘‘A root
privilege management scheme with revocable authorization for Android
the computational cost and network overhead. The proposed devices,’’ J. Netw. Comput. Appl., vol. 107, no. 4, pp. 69–82, Apr. 2018.
scheme is secure under the CDH assumption in the random [21] R. Hamza, Z. Yan, K. Muhammad, P. Bellavista, and F. Titouna,
oracle model. The proxy server adds noise to the ciphertext ‘‘A privacy-preserving cryptosystem for IoT e-healthcare,’’ Inf. Sci., to be
using -DP, rather than the data providers, to facilitate data published.
[22] A. Hassan, N. Eltayieb, R. Elhabob, and F. Li, ‘‘An efficient certificateless
analytics tasks. Our proposed protocol guarantees a secure user authentication and key exchange protocol for client-server environ-
multi-party computation and privacy-preserving classifica- ment,’’ J. Ambient Intell. Humanized Comput., vol. 9, no. 6, pp. 1713–1727,
tion based on partial homomorphic encryption. 2018.
[23] P. Li, T. Li, H. Ye, J. Li, X. Chen, and Y. Xiang, ‘‘Privacy-preserving
For future works, we plan to study the parallel algorithm machine learning with multiple data providers,’’ Future Gener. Comput.
for secure multiparty computation on Blockchain techniques. Syst., vol. 87, pp. 341–350, Oct. 2018.
We also aim to investigate the using of more complex com- [24] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and
J. M. Hellerstein, ‘‘Distributed GraphLab: A framework for machine learn-
putations such as partial differential equations over encrypted ing and data mining in the cloud,’’ Proc. VLDB Endowment, vol. 5, no. 8,
data. pp. 716–727, 2012.

VOLUME 7, 2019 146329


A. Hassan et al.: Efficient Outsourced Privacy Preserving Machine Learning Scheme With Public Verifiability

[25] Q. Liu, G. Wang, X. Liu, T. Peng, and J. Wu, ‘‘Achieving reliable and ALZUBAIR HASSAN received the B.Sc. degree in
secure services in cloud computing environments,’’ Comput., Elect. Eng., computer science from the University of Kassala,
vol. 59, pp. 153–164, Apr. 2017. in 2010, the M.Sc. degree in mathematical science
[26] J. W. Bos, K. Lauter, J. Loftus, and M. Naehrig, ‘‘Improved security for from the University of Khartoum, in 2013, and
a ring-based fully homomorphic encryption scheme,’’ in Proc. IMA Int. the Ph.D. degree in computer science and technol-
Conf. Cryptogr. Coding. Berlin, Germany: Springer, 2013, pp. 45–64. ogy from the University of Electronic Science and
[27] E. Hesamifard, H. Takabi, and M. Ghasemi, ‘‘CryptoDL: Deep neural net- Technology of China, in 2018. He is currently a
works over encrypted data,’’ 2017, arXiv:1711.05189. [Online]. Available:
Postdoctoral Researcher with the School of Com-
https://arxiv.org/abs/1711.05189
puter Science and Cyber Engineering, Guangzhou
[28] P. Li and J. Li, ‘‘Multi-key privacy-preserving deep learning in cloud
computing,’’ Future Generat. Comput. Syst., vol. 74, pp. 76–85, Sep. 2017. University. His current research interests include
[29] T. Chen and S. Zhong, ‘‘Privacy-preserving backpropagation neural net- cryptography, network security, and privacy-preserving.
work learning,’’ IEEE Trans. Neural Netw., vol. 20, no. 10, pp. 1554–1564,
Oct. 2009.
[30] A. Bansal, T. Chen, and S. Zhong, ‘‘Privacy preserving back-propagation
neural network learning over arbitrarily partitioned data,’’ Neural Comput.
Appl., vol. 20, no. 1, pp. 143–150, 2011.
[31] S. Samet and A. Miri, ‘‘Privacy-preserving back-propagation and extreme
learning machine algorithms,’’ Data Knowl. Eng., vols. 79–80, pp. 40–61,
Sep./Oct. 2012. RAFIK HAMZA received the M.Sc. and Ph.D.
[32] T. Graepel, K. Lauter, and M. Naehrig, ‘‘Ml confidential: Machine learning degrees in computer science from the University
on encrypted data,’’ in Proc. Int. Conf. Inf. Secur. Cryptol. Berlin, Germany: of Batna 2, in 2014 and 2017, respectively. He
Springer, 2012, pp. 1–21. was a Principal Engineer with R&D Sonatrach,
[33] Q. Liu, G. Wang, F. Li, S. Yang, and J. Wu, ‘‘Preserving privacy with prob- Boumerdès, in 2018. He is currently a Researcher
abilistic indistinguishability in weighted social networks,’’ IEEE Trans. with Guangzhou University. He is involved in
Parallel Distrib. Syst., vol. 28, no. 5, pp. 1417–1429, May 2017.
machine learning and cryptography techniques.
[34] Z. Wei, J. Li, X. Wang, and C.-Z. Gao, ‘‘A lightweight privacy-preserving
He has published several articles between top
protocol for vanets based on secure outsourcing computing,’’ IEEE Access,
vol. 7, pp. 62785–62793, 2019. international scientific journals and conferences.
[35] C.-Z. Gao, Q. Cheng, X. Li, and S.-B. Xia, ‘‘Cloud-assisted privacy- His research interests include machine learning,
preserving profile-matching scheme under multiple keys in mobile social information security, access control, image and video processing, chaos the-
network,’’ Cluster Comput., vol. 22, no. 1, pp. 1655–1663, 2018. ory, and lightweight cryptography applications. He is serving as a Reviewer
[36] I. Mironov, O. Pandey, O. Reingold, and S. Vadhan, ‘‘Computational for well-reputed international journals and conferences.
differential privacy,’’ in Proc. Annu. Int. Cryptol. Conf. Berlin, Germany:
Springer, 2009, pp. 126–142.
[37] C. Dwork and A. Roth, ‘‘The algorithmic foundations of differential pri-
vacy,’’ Found. Trends Theor. Comput. Sci., vol. 9, nos. 3–4, pp. 211–407,
2014.
[38] Z. Huang, S. Mitra, and G. Dullerud, ‘‘Differentially private iterative
synchronous consensus,’’ in Proc. ACM Workshop Privacy Electron. Soc.,
2012, pp. 81–90. HONGYANG YAN received the M.S. degrees
[39] B. Chevallier-Mames, P. Paillier, and D. Pointcheval, ‘‘Encoding-free elga-
from the School of Mathematics and Information
mal encryption without random oracles,’’ in Proc. Int. Workshop Public
Science, Guangzhou University, in 2016. She is
Key Cryptography. Berlin, Germany: Springer, 2006, pp. 91–104.
[40] F. Bao, R. H. Deng, and H. Zhu, ‘‘Variations of Diffie-Hellman problem,’’ currently pursuing the Ph.D. degree with Nankai
in Proc. Int. Conf. Inf. Commun. Secur. Berlin, Germany: Springer, 2003, University. Her research interests include secure
pp. 301–312. access control, such as attribute-based cryptogra-
[41] R. Canetti and S. Hohenberger, ‘‘Chosen-ciphertext secure proxy re- phy and identity-based cryptography, and the IoT
encryption,’’ in Proc. 14th ACM Conf. Comput. Commun. Secur. 2007, secure.
pp. 185–194.
[42] M. Shen, X. Tang, L. Zhu, X. Du, and M. Guizani, ‘‘Privacy-preserving
support vector machine training over blockchain-based encrypted IoT data
in smart cities,’’ IEEE Internet Things J., to be published.
[43] M. Joye and B. Libert, ‘‘Encoding-free ElGamal-type encryption schemes
on elliptic curves,’’ in Proc. Cryptographers Track RSA Conf. Springer,
2017, pp. 19–35.
[44] A. De Caro and V. Iovino, ‘‘JPBC: Java pairing based cryptography,’’
in Proc. 16th IEEE Symp. Comput. Commun. (ISCC), Corfu, Greece,
PING LI received the M.S. and Ph.D. degrees
Jun./Jul. 2011, pp. 850–855.
in applied mathematics from Sun Yat-sen Uni-
[45] P. Li, J. Li, Z. Huang, C.-Z. Gao, W.-B. Chen, and K. Chen, ‘‘Privacy-
preserving outsourced classification in cloud computing,’’ Cluster Com- versity, in 2011 and 2016, respectively. She held
put., vol. 21, no. 1, pp. 277–286, Mar. 2018. a postdoctoral position at Guangzhou University,
[46] F.-J. González-Serrano, A. Amor-Martín, and J. Casamayón-Antón, from 2016 to 2018. She is currently a Researcher
‘‘Supervised machine learning using encrypted training data,’’ Int. J. Inf. with the School of Computer Science, South China
Secur., vol. 17, no. 4, pp. 365–377, 2018. Normal University. Her current research inter-
[47] K.-A. Shim, Y.-R. Lee, and C.-M. Park, ‘‘EIBAS: An efficient identity- ests include cryptography, privacy-preserving, and
based broadcast authentication scheme in wireless sensor networks,’’ Ad cloud computing.
Hoc Netw., vol. 11, no. 1, pp. 182–189, Jan. 2013.

146330 VOLUME 7, 2019

You might also like