Improving Database Security in Cloud Computing by Fragmentation of Data
Improving Database Security in Cloud Computing by Fragmentation of Data
Improving Database Security in Cloud Computing by Fragmentation of Data
Abstract—Cloud computing is a technology that facilitates to a modern style of computing. However, these advantages, as
numerous configurable resources in which the data is stored and Hacg et al. [2] claim, come at a high price, with users facing
managed in a decentralized manner. However, since the data is greater privacy risks and increased vulnerabilities when they
out of the owner's control, concerns have arisen regarding data
confidentiality. Encryption techniques have previously been pro- move their, often private or sensitive, data to a cloud. More
posed to provide users with confidentiality in terms of outsource importantly, this data is often not under the direct control of
storage; however, many of these encryption algorithms are weak, the owner, so concerns arise regarding data confidentiality.
enabling data security to be breached simply by compromising an Consequently, providers are able to compromise and access
algorithm. We propose a combination of encryption algorithms
sensitive data, which constitutes an invasion of the database
and a distribution system to improve database confidentiality.
This scheme distributes the database across the clouds based owner's privacy. Finding trusted providers to store sensitive
on the level of security that is provided by the encryption data is not an easy task, as many cloud computing providers
algorithms utilized. We analyzed our scheme by designing and may not even trust their own employees [3]. Additionally, in
conducting experiments and by comparing our scheme with some cases, providers also reserve the right to change their
existing solutions. The results demonstrate that our scheme
terms and conditions, which means that data confidentiality
offers a highly secure approach that provides users with data
confidentiality and provides acceptable overhead performance. and privacy risks are even more critical to consider [4].
In light of these ongoing security-related issues, encryption
Index Terms—Cloud Computing, Encryption Algorithms, Se-
curity, Distributed System. emerges as the simplest solution, as encrypting data before
sending it to the cloud can prevent providers from obtaining
sensitive information. Unfortunately, however, encrypted data
I. I NTRODUCTION cannot be easily queried, making it difficult for users to access
their data. Some proposed solutions to this dilemma involve
Cloud computing is defined by NIST [1] as “a model for using asymmetric cryptography, with the private key being
enabling ubiquitous, convenient, on-demand network access shared with the providers. However, the provider can still infer
to a shared pool of configurable computing resources (e.g., sensitive information with this method when they perform the
networks, servers, storage, applications, and services) that can decryption in response to the client's query.
be rapidly provisioned and released with minimal manage- Similarly, symmetric key cryptography involves decryption
ment effort or service provider interaction”. Cloud computing on the provider's side, allowing providers to infer sensitive
involves storing data using a third-party or non-central storage information in this case as well. Hence, neither symmetric nor
mechanism and requires the ability to access this data from asymmetric cryptography offer a suitable solution. Since there
anywhere at any time. is no single encryption algorithm that is able to support all
It offers a number of services and includes a variety of Structured Query Language (SQL) queries without decryption,
models to meet users' needs at affordable prices. Scalability there is a need for a system that provides users with security
is a cloud computing characteristic, whereby data is scaled while also supporting a variety of query types. This leads
around the cloud provider's servers. The powerful devices us to the following questions: How can we guarantee data
cloud providers rely on to operate the cloud give users high confidentiality while using untrusted cloud computing provider
performance processing, fast network speeds, and a huge resources, and what encryption algorithms can be utilized to
amount of storage space. support a variety of queries?
Despite the perceived benefits of cloud computing, there
are still major security concerns surrounding storing data to II. R ELATED W ORK
the cloud. One of these concerns is adequately protecting
the confidentiality of sensitive data. This ongoing and very Work regarding the confidentiality of data stored through
important concern has motivated us to seek a way to improve outsourcing is categorized into four groups: (a) fragmentation
security in a cloud computing environment and to create a vi- schemes (a column-based partition), (b) hardware-software
able real-life application. Cloud computing has many attractive solutions, (c) sensitive data VS. non-sensitive data, and (d)
advantages that encourage potential users to consider moving a combination of encryption algorithms.
44
2017 International Conference on Computer and Applications (ICCA)
a highly secure encryption algorithm and stored in the master C. Proxy's Functionality
cloud while not disclosing the encryption key to the master The proxy is an important part of our scheme, as it per-
cloud provider. In fact, none of the keys are disclosed to any forms most of the processing. Processes performed by the
of the cloud providers. Furthermore, when the relation for the proxy server include creation, insertion, encryption, decryp-
master cloud is created, we ensure that the relation has one tion, query parsing and the retrieval of results. The proxy
column that is used as an index to the tuples of the relation. server has to be inside the private cloud and has to communi-
Thus, in addition to the primary key of the original relation, cate with the outside world through a highly secure channel,
we create another attribute (table column) containing a unique such as SSL.
index to the tuples of the relation. When a proxy receives a client query, it parses it and trans-
The master cloud is the core part of our scheme, as it forms it into a set of sub-queries on extended columns stored
maintains the entire relation in one place. Hence, the en- in the slave clouds. Only the master cloud storing the whole
cryption algorithm to be used in the master cloud has to be encrypted relation is accessed. Each slave returns to the proxy
secure enough to hold sensitive data. A number of studies a set of indices forming the answer of the sub-query it has
have shown that AES-CBC is a secure and reliable algorithm received. When the proxy receives the results of sub-queries
for the outsourced storage of sensitive data [20] [21] and, it performs either the union or intersection algorithm on the
consequently, several studies have used AES-CBC to store indexes if there is more than one where condition/predicate in
sensitive data [22] [23] [24]. Therefore, in our scheme, we the query’s where clause. If these conditions are separated by
also use the AES-CBC encryption algorithm to encrypt the OR, the union algorithm is used; otherwise, the intersection
master cloud relation. The index column is the only column algorithm is used. Once the indices of the final result are
kept in plain text, so that the user can only query the relation formed, the proxy issues a query to the master relation to
through that index to fetch the desired tuple/s fetch the tuples that match them. The proxy performs all of
The index serves as a candidate key to the relation and the encryption and decryption. In the encryption process, the
is replicated in each fragment, which contains an extended proxy encrypts any inserted values that come from the user
column of the relation. The idea of storing an entire relation before storing them at the clouds. The selection query value
in the master cloud (and not in the client cloud) is to obtain the also has to be encrypted by the proxy before querying any
greatest benefit from cloud computing by preventing any kind slave clouds.
of data storage on the client side. In most cases, the query
has to travel to the master cloud to obtain encrypted data; IV. I MPLEMENTATION AND E VALUATION
however, this is not always the case. Furthermore, only two A. Cloud Computing Tools and Configuration
kinds of queries can be submitted to the master cloud database As a first step in proving our concept, we created a public
because any data encrypted by the AES-CBC algorithm cannot cloud computing account at Rackspace, which offers open
be queried directly without decryption. source cloud computing. We then created a number of servers
The configuration continues with the vertical fragmentation equal to the number of fragments that we need, plus one more
to create a variable number of replicas of the columns that are for the master cloud. These servers have the following features:
stored in the slave clouds. Furthermore, before a column is Linux OS, XAMPP Server, CPU: 2 vCPUs, RAM: 4 GB,
stored in a slave cloud it is encrypted while not disclosing System Disk: 160 GB, and Network: 400 Mb/s. The proxy,
the encryption key to the cloud provider. Whether a table on the other hand, is running the OS X. It is also running
column is replicated and which encryption algorithm is used with the Tomcat server. OS X, Tomcat server 7.0.53, CPU:
is determined by the observed or anticipated access pattern to 2.4 GHz Intel Core i5, RAM: 10 GB RAM, System Disk:
the relation. We shall simply state that if the access pattern to 500 GB, Network: 150 Mbps.
the relation R is such that column A is frequently used by the
queries with a predicate (R.AΘk), where Θ is an arithmetic B. Evaluation Method
relational operator and k is a constant, then the column is We evaluate our method by determining the total delay using
replicated and encrypted using an algorithm applicable for an analytical model and through modeling. In the evaluation,
querying using Θ. Furthermore, each encrypted column is we concentrate on the total delay (in ms) per user SQL request.
also augmented with an index column to create an extended As the user will receive the whole query result in one response,
column. Each replicated and encrypted column is stored in a the total delay in our experiments is also latency. Our method
slave cloud. is compared to two base methods:
If a column of the relation is accessed by distinct queries (a) The first is an unsecured method in which the DB
with different predicates, a column of the relation may appear is stored in one cloud and there are no efforts to provide
more than once in a fragment, but is encrypted using a different confidentiality. Thus, data is stored and communicated in plain
encryption algorithm. For instance if a column is accessed by text. We refer to this method as the Unsecured approach.
two different queries with two respective predicates (R.A = (b) The second method is that of [14] approach, in which
k1) and (R.A > k2), then the column R.A would appear in the data is encrypted and stored in one cloud. In that cloud,
a fragment more than once, once encrypted using the DET each column is encrypted using the encryption algorithms that
algorithm and once encrypted using the OPE algorithm. supports the desired operation(s). We refer to this approach as
45
2017 International Conference on Computer and Applications (ICCA)
Query pro.
Query pro.
Query pro.
Query pro.
Comm.
Comm.
Comm.
Comm.
Crypto
Crypto
Crypto
Proxy
Proxy
total delay. We use analytical evaluation as well as emulation.
134 339 445 445
When emulation is used, each experiment is repeated a number 0.2
84 50 144 50 144 211 77 144 2 211 77 144 2
0.6
1) Select statements in which the Where clause contains 124 62 304 62 432 378 88 432 3 378 88 432 3
C. Delays Derived Using the Analytical Model S.F. centralized centralized distributed serial distributed parallel
Query pro.
Query pro.
Query pro.
Query pro.
Comm.
Comm.
Comm.
Comm.
Crypto
Crypto
Crypto
Proxy
Proxy
Based on the analytical model, we compare four approaches:
Unsecured centralized, Secure centralized, Secure distributed 0.2
134 338 536 469
84 50 144 50 144 288 103 144 1 220 103 144 2
serial, and Secure distributed parallel. We calculate the total 160 568 845 740
0.4
delays of each of these approaches. We calculate the delays 104 56 224 56 288 445 109 288 3 302 109 288 3
186 798 1008 931
using the analytical model while varying the selectivity to 0.6
124 62 304 62 432 457 114 432 5 381 114 432 4
vary the number of retrieved tuples and also the number 0.8
211 1027 1244 1163
144 67 384 67 576 541 120 576 7 462 120 576 5
of predicates. Table I shows the delays in milliseconds for 237 1257 1479 1396
1
query 1, which has a Where clause with one predicate. The 164 73 464 73 720 624 126 720 9 544 126 720 6
46
2017 International Conference on Computer and Applications (ICCA)
TABLE III: The total delays in milliseconds of the four schemes for TABLE V: The total delays in milliseconds of the three schemes for
query 3 query 2
Unsecure Secure Secure Secure Secure
S.F. Unsecure centralized Secure centralized
centralized centralized distributed serial distributed parallel S.F. distributed serial
Query pro.
Query pro.
Query pro.
Query pro.
Query pro.
Query pro.
Query pro.
Comm.
Comm.
Comm.
Comm.
Crypto
Crypto
Crypto
Comm.
Proxy
Proxy
Comm.
Comm.
Crypto
Crypto
Proxy
134 339 634 497
0.2
84 50 144 50 144 357 130 144 3 220 130 144 3 153 493 976
0.2
160 569 879 729 128 25 283 42 168 733 79 163 1
0.4
104 56 224 56 288 452 135 288 4 302 135 288 4 160 710 1459
0.4
186 798 1108 960 107 35 396 93 221 1162 104 191 2
0.6
124 62 304 62 432 429 141 432 6 381 141 432 6
185 884 1707
211 1028 1350 1111 0.6
0.8 143 42 536 109 239 1347 139 219 2
144 67 384 67 576 620 147 576 7 463 147 576 7
200 972 2165
237 1258 1585 1425 0.8
1 149 51 549 113 310 1694 179 290 2
164 73 464 73 720 704 152 720 9 544 152 720 9
210 1582 2223
1
151 59 1036 139 407 1687 187 346 3
TABLE IV: The total delays in milliseconds of the three schemes
for query 1
TABLE VI: The total delays in milliseconds of the three schemes
Secure for query 3
Unsecure centralized Secure centralized
S.F. distributed serial
Secure
Query pro.
Query pro.
Query pro.
Comm.
Comm.
Comm.
Crypto
Crypto
Proxy
S.F. distributed serial
Query pro.
Query pro.
Query pro.
Comm.
Comm.
Comm.
Crypto
Crypto
Proxy
130 451 777
0.2
106.2 23.8 253 30.5 168 566 41 169 1
155 504 1196
115 585 1147 0.2
0.4 129 26 277 62 165 939 91 165 1
84.8 30.2 314 49 222 881 74 191 1
175 660 1729
143 804 1492 0.4
0.6 138 37 336 101 223 1417 122 188 2
102.9 40.1 495 62.8 247 1168 103 221 2
207 775 1950
152 983 1757 0.6
0.8 164 43 429 109 237 1580 159 209 2
102.9 49.1 593 83 307 1328 142 285 2
273 1194 2291
171 1287 2073 0.8
1 220 53 777 115 302 1805 203 281 2
115.2 55.8 790 98.7 390 1580 148 343 2
220 1441 2763
1
159 61 933 137 371 2196 213 350 4
47
2017 International Conference on Computer and Applications (ICCA)
F. Discussion
We observe that the delays derived through analytical mod-
eling, with the exception of crypto delays, are fewer than
those derived through emulation. In the analytical model, we
determine the component delays individually. Thus, the total
delay of the analytical model is the sum of the component
delays. The analytical model is driven by first calibrating
the models’ parameters through measurements of average
delays in communication, query processing, and decryption,
and then using such parameters in analytically predicting the
delays of the methods under consideration. When making the
measurements, we use communication over the Internet and
also measure decryption and query processing delays that are
Fig. 4: Query processing delays for the Secure Distributed approach performed on infrastructure provided by the cloud.
Delays due to emulation are also obtained by using the Inter-
net for communication and cloud infrastructure for decryption
and query processing, and thus, we can expect much variability
in both the analytical and emulation approaches as both depend
on when measurements are done. We all accept and experience
much variability in delays when communicating over the
Internet.
Delays in cloud infrastructure are also variable as they
depend on the load, resources allocated by the cloud provider,
and also the cloud's quality of software that is used to measure
the load and balances the allocation of resources to handle
the load. Thus, although analytical modeling does not predict
the actual delays with accuracy in comparison to the actual
Fig. 5: Crypto delays for the Secure Distributed approach delays observed in emulation, it does provide guidelines on
the general trend when variables, such as selectivity or type
of issued queries, are varied.
V. C ONCLUSION
Cloud computing technology is attractive for storing and
managing data. However, concerns over confidentiality with
regard to storing sensitive data prevents many public and
commercial organizations from moving to the cloud. A number
of researchers have attempted to provide solutions to the
security concerns associated with outsourcing storage. The
aim of this research was to prevent untrustworthy providers
from obtaining sensitive data. We proposed a combination of
approaches by using the encryption algorithms in [14] and
Fig. 6: Proxy delays for the Secure Distributed approach obfuscation by distributing data amongst different clouds. The
encryption algorithms provide the user with confidentiality and
also support the query processing of encrypted data, while the
the Secure Centralized and Secure Distributed approaches.The distributed technique provides greater security and prevents
same applies for delays derived through emulation. cloud providers from procuring meaningful information. The
4) Proxy delay: Fig. 6 shows the proxy processing delays prototype of the proposal was implemented, with the results
for all queries in both models for the Secure Distributed showing that our scheme provides greater security with ac-
approach. Recall that the proxy delays include delays to ceptable delays.
perform the intersection of results returned from slave clouds We used two modeling techniques to determine the delays
and overhead associated in transforming the user query to of our proposed method and also of the methods used for
queries on clouds and also processing results received from the comparison. One method was based on an analytical model
clouds. In general, the proxy delays in both models are much while the other method used emulation using a prototype. We
less than the delays of the other components. The differences calibrated our analytical models’ parameters by measurements.
in the delays between analytical and emulation models are When we compared delays derived through analytical mod-
much fewer than the differences in the other components. eling to those derived through emulation, we observed that
48
2017 International Conference on Computer and Applications (ICCA)
in most cases the analytical model underestimated the delays. [19] S. D. Tetali and T. Millstein, “MrCrypt : Static Analysis for Secure
Thus, the analytical model can be used to analyze the behavior Cloud Computations,” pp. 271–286, 2013.
[20] J. Daemen, The design of Rijndael : AES - the advanced encryption
of the system in terms of the trends on delays but it cannot be standard with 17 tables. Berlin [u.a.]: Springer, 2002.
used to accurately predict the delays. Of course, the same can [21] J. Blomer, “Fault Based Cryptanalysis of the Advanced Encryption
be applied for any method that is predicting delays when the Standard (AES),” Lecture notes in computer science., no. 2742, pp. 162
– 181, 2003.
load on the system is not taken into account - the load in our [22] E. M. Mohamed, “Enhanced Data Security Model for Cloud Comput-
case is a load that affects delays due to communication over the ing,” pp. 12–17, 2012.
Internet or due to processing delays on a cloud infrastructure. [23] A. Arasu, S. Blanas, K. Eguro, M. Joglekar, R. Kaushik, D. Kossmann,
R. Ramamurthy, P. Upadhyaya, and R. Venkatesan, “Secure database-
ACKNOWLEDGEMENTS as-a-service with Cipherbase,” Proceedings of the 2013 international
conference on Management of data - SIGMOD ’13, p. 1033,
This work is partially supported by Aljouf University repre- 2013. [Online]. Available: http://dl.acm.org/citation.cfm?doid=2463676.
sented by the Saudi Arabian Cultural Bureau in Canada. The 2467797
[24] G. Nalinipriya and R. Aswin Kumar, “Extensive medical data
authors would like to thank the anonymous reviewers for their storage with prominent symmetric algorithms on cloud - A protected
constructive comments. framework,” International Conference on Smart Structures and Systems
- Icsss’13, pp. 171–177, Mar. 2013. [Online]. Available: http:
R EFERENCES //ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6623021
49