Efficient keyword search on encrypted dynamic cloud data

  • *Corresponding author: Laltu Sardar

    *Corresponding author: Laltu Sardar 
  • Sensitive information is increasingly being outsourced to the cloud. In order to protect the privacy of such sensitive data, cloud users (clients) encrypt their data before outsourcing. However, this poses a difficulty to later perform search operations on the encrypted data. Searchable encryption schemes enable a client to search and retrieve the cloud data (based on the keywords present in the data) when the data is encrypted. Dynamic searchable encryption schemes allow the client to search over the encrypted cloud data even when new documents are added to or deleted from the encrypted data. There is a trade-off between security (that is measured in terms of information leaked to the cloud) and the efficiency of dynamic searchable encryption schemes. Stronger security guarantees often come at a cost of less efficiency.

    In this work, we propose a new dynamic searchable encryption scheme for cloud data that achieves better security guarantees and improved efficiency compared to popular dynamic searchable encryption schemes. Our scheme uses an efficient data structure that reduces storage, lookup (search) time, and database modification time. We build a prototype of our scheme and experiment on large real-life datasets. We show our scheme performs better than the existing schemes, which provide similar (or weaker) security.

    \begin{equation} \\ \end{equation}
  • Figure 1.  Internal data structure of $ TDL $

    Figure 2.  Addition of a node in $ TDL $

    Figure 3.  Deletion of a node in $ TDL $

    Figure 4.  Example of the $ TDL $ built with $ W $ and $ {\textbf{f}} $

    Figure 5.  Example of an addition of a file

    Figure 6.  Example of a deletion of a file

    Figure 7.  Number of keyword-file pairs vs. Search query time per search query

    Figure 8.  Number of files vs. Search query time per search query

    Figure 9.  Number of pairs vs. Build time

    Figure 10.  Size of the files vs. Build time

    Figure 11.  Number of files vs. Build time

    Figure 12.  Build time comparison between KPR[18] and our scheme

    Figure 13.  Search time comparison between KPR and our scheme

    Figure 14.  Add time comparison between KPR and our scheme

    Figure 15.  Delete time comparison between KPR and our scheme

    Table 1.  Notations

    Symbols Meaning
    $ F $ PRP $ \{0, 1\}^\lambda \times \{0, 1\}^{\log |W|} \rightarrow \{0, 1\}^{\log |W|} $
    $ F^\prime $ PRP $ \{0, 1\}^\lambda \times \{0, 1\}^{\log l_{max}} \rightarrow \{0, 1\}^{\log l_{max}} $
    $ G $ PRG $ \{0, 1\}^\lambda \times \{0, 1\}^* \rightarrow \{0, 1\}^{\log |W|} $
    $ G^\prime $ PRG $ \{0, 1\}^\lambda \times \{0, 1\}^* \rightarrow \{0, 1\}^{\log l_{max}} $
    $ N $ A cell of the array $ A $
    $ L $, $ R $, $ D $ Pointer to the left/ right/ down node in $ TDL $
    $ Rs $ Pointer to the right node in $ TDL $ for search
    $ \overline{L} $, $ \overline{R} $, $ \overline{D} $, $ \overline{Rs} $ Encryption of $ L $, $ R $, $ D $, $ Rs $ respectively
    $ W $ Set of all possible keywords/ Dictionary
    $ id(w) $, $ id^\prime(f) $ Identifier corresponding to the keyword $ w $ / file $ f $
    $ P(, ) $ A PRG $ \{0, 1\}^\lambda \times \{0, 1\}^* \rightarrow \{0, 1\}^\lambda $
    $ t_a, t_s, t_d $ Add, search and delete token respectively
    $ \lambda $ The security parameter
    f A collection of files
    $ {f} $ A file $ f \in $f
    $ \hat{f} $ Set of distinct keywords in the file $ f $
    $ {sf} $ Set of sorted identifiers of keywords in the file $ \hat{f} $
    $ H_i $ Standard hash function like SHA-256 ($ i=1(1)5 $)
    $ k_w $, $ k_f $ Key corresponding to the keyword $ w $/ file $ f $
    $ l_{max} $ Maximum #files that can be stored by the cloud
    $ l_w $ The average frequency of a keyword
    $ [n] $ The set of integers $ \{1, 2, \ldots, n \} $
    $ A $ The main array of structure for inverted index
    $ L_{free} $ A list of free cells of $ A $
    $ T $ A dictionary corr. to the set of keywords
    $ T' $ A dictionary corr. to the set of files
    $ \gamma $ The encrypted inverted index consists of $ T $, $ T' $ and $ A $
    $ c_f $ encryption of $ f $
    c $ \{c_f: f \in {\textbf{f}} \}$, collection of encrypted files
    Table 2.  Comparison among DSSE schemes based on client-side costs

    Scheme Communication bandwidth$ ^{\dagger} $ Computation Client storage
    Search Add Delete Search Add Delete
    HK [15] $ O(1) $ $ O(|\hat{f}|) $ $ O(1) $ $ O(1) $ $ O(|\hat{f}|) $ $ O(1) $ $ O(N) $
    SPS [27] $ O(|f_w| + \log{N}) $ rounds$ ^* $ $ O(\log{N}) $ rounds$ ^* $ $ O(\log{N}) $ rounds$ ^* $ $ O(4 . min\{\alpha + \log N, $ $ |f_w| \log ^3 N \}) $ $ O(|\hat{f}| . \log ^2 {N}) $ $ O(|\hat{f}| \log ^2 {N}) $ $ O(N^\beta) $
    Bost [2] $ O(1) $ $ O(f_w) $ rounds$ ^* $ - $ O(1) $ $ O(4|\hat{f}|) $ - $ O(|W| (\log {|\textbf{f}}| + \lambda)) $
    KPR [18] $ O(1) $ $ O(|\hat{f}|) $ $ O(1) $ $ O(1) $ $ O(|\hat{f}|) $ $ O(1) $ $ O(1) $
    Our scheme $ O(1) $ $ O(|\hat{f}|) $ $ O(1) $ $ O(1) $ $ O(|\hat{f}|) $ $ O(1) $ $ O(1) $
    $N\equiv$ number of keyword-file pairs, $W\equiv$ set of all possible keywords, $f_w\equiv$ set of files containing a keyword $w$, $\hat{f}\equiv$ set of distinct keywords in $f$, ${{\bf{f}}} \equiv $ set of files in the database, $|.|\equiv $ cardinality, $0< \beta <1$, $a_w\equiv$ # times the queried keyword $w$ was historically added to the database, $r\equiv$ one ORAM read in TWORAM, $d_w \equiv$ the number of times the searched for keyword has been added or deleted, $p\equiv$ # processors, $\bar{O} \equiv$ order avoiding $\log \log N$, computation includes ORAM access. $a\equiv$ length of keyword-file storage, $b \equiv $ max. supported length of file id, $\lambda =$ security parameter.
    $^{\dagger}$ This amount of bandwidth (measured per keyword-file storage) is required by the client to request the cloud server to perform different tasks.
    $^*$ Communication bandwidth is very high due to the large number of communication rounds needed.
    Table 3.  Comparison among DSSE schemes based on server-side costs

    Scheme Computation Storage
    Search Add Delete Index size
    HK [15] $ O(N) $ for first search
    $ O(1) $ for subsequent search
    $ O(|\hat{f}|) $ $ O(|W|) $ $ O(N.(a+2b)) $
    SPS [27] - - - $ O(N .(3a+b)) $
    Bost [2] $ O(2|f_w|) $ $ O(|\hat{f}|) $ - $ O(N) $
    KPR [18] $ O(|f_w|) $ $ O(|\hat{f}|) $ $ O(|\hat{f}|) $ $ O(N.(7a+2b+2\lambda)) $
    Our scheme $ O(|f_w|) $ $ O(|\hat{f}|) $ $ O(|\hat{f}|) $ $ O((N+|W|).(4a+b+\lambda)) $
    Table 4.  Time Taken per keyword-file pair

    Functions $ Build $ $ AddTkn $ Add $ SearchTkn $ $ Search $ $ DeletTkn $ $ Delete $
    Time ($ \mu $s/pair) 7.9368 7.1995 0.1190 0.0139 1.2162 0.1050 3.4868
    Std. deviation 0.2852 0.1875 0.0017 0.0021 0.0028 0.0134 0.0427
    Table 5.  Time Taken by per query

    Functions $ AddTkn $ Add $ SearchTkn $ $ DeletTkn $ $ Delete $
    Time ($ \mu $s/query) 625.0 10.33 2.685 5.876 197.9
    Std. deviation 16.28 0.146 0.013 0.117 24.53
