Abstract
Most differentially private algorithms assume a central model in which a reliable third party inserts noise to queries made on datasets, or a local model where the data owners directly perturb their data. However, the central model is vulnerable via a single point of failure, and the local model has the disadvantage that the utility of the data deteriorates significantly. The recently proposed shuffle model is an intermediate framework between the central and local paradigms. In the shuffle model, data owners send their locally privatized data to a server where messages are shuffled randomly, making it impossible to trace the link between a privatized message and the corresponding sender. In this paper, we theoretically derive the tightest known differential privacy guarantee for the shuffle models with k-Randomized Response (k-RR) local randomizers, under histogram queries, and we denoise the histogram produced by the shuffle model using the matrix inversion method to evaluate the utility of the privacy mechanism. We perform experiments on both synthetic and real data to compare the privacy-utility trade-off of the shuffle model with that of the central one privatized by adding the state-of-the-art Gaussian noise to each bin. We see that the difference in statistical utilities between the central and the shuffle models shows that they are almost comparable under the same level of differential privacy protection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, R., Srikant, R., Thomas, D.: Privacy preserving olap. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 251–262 (2005)
Balcer, V., Cheu, A.: Separating local & shuffled differential privacy via histograms. arXiv preprint arXiv:1911.06879 (2019)
Balle, B., Bell, J., Gascón, A., Nissim, K.: The privacy blanket of the shuffle model. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019. LNCS, vol. 11693, pp. 638–667. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26951-7_22
Balle, B., Wang, Y.X.: Improving the gaussian mechanism for differential privacy: analytical calibration and optimal denoising. In: International Conference on Machine Learning, pp. 394–403. PMLR (2018)
Bittau, A., et al.: Prochlo: strong privacy for analytics in the crowd. In: Proceedings of the 26th Symposium on Operating Systems Principles, pp. 441–459 (2017)
Cheu, A.: Differential privacy in the shuffle model: a survey of separations. arXiv preprint arXiv:2107.11839 (2021)
Cheu, A., Smith, A., Ullman, J., Zeber, D., Zhilyaev, M.: Distributed differential privacy via shuffling. In: Ishai, Y., Rijmen, V. (eds.) EUROCRYPT 2019. LNCS, vol. 11476, pp. 375–403. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17653-2_13
Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Local privacy and statistical minimax rates. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pp. 429–438. IEEE (2013)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Erlingsson, Ú., Feldman, V., Mironov, I., Raghunathan, A., Talwar, K., Thakurta, A.: Amplification by shuffling: From local to central differential privacy via anonymity. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 2468–2479. SIAM (2019)
Feldman, V., McMillan, A., Talwar, K.: Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. arXiv preprint arXiv:2012.12803 (2020)
The gowalla dataset. [online]. https://snap.stanford.edu/data/loc-gowalla.html (2011), (Accessed 10 Aug 2021)
Kairouz, P., Bonawitz, K., Ramage, D.: Discrete distribution estimation under local privacy. In: International Conference on Machine Learning, pp. 2436–2444. PMLR (2016)
Koskela, A., Heikkilä, M.A., Honkela, A.: Tight accounting in the shuffle model of differential privacy. arXiv preprint arXiv:2106.00477 (2021)
Sommer, D.M., Meiser, S., Mohammadi, E.: Privacy loss classes: the central limit theorem in differential privacy. Proc. Priv. Enhancing Technol. 2019(2), 245–269 (2019)
Acknowledgment
The work is supported by the European Research Council (ERC) project HYPATIA under the European Unions Horizon 2020 research and innovation programme. Grant agreement no. 835294 and ELSA - European Lighthouse on Secure and Safe AI funded by the European Union under grant agreement No. 101070617.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Proof of Theorem Theorem 1
Setting \(p=\mathbb {P}[x_0|x_0],\,\overline{p}=\mathbb {P}[x_0|y\ne x_0]\) in \(\mathcal {R}_{{\text {kRR}}},\,\forall s\in [n]\), \(\mathbb {P}[\mathcal {M}_{x_0}(x_0)=s]\)
By similar arguments as above, for any \(s\in \{0,\ldots ,n\}\), \(\mathbb {P}[\mathcal {M}_{x_0}(x_1)=s]\)
Using Result 1, for every \(k>2\) and \(s\in \{0,1,\ldots ,n\}\), we can say that \(\mathcal {M}\) induces a tight \((\epsilon ,\,\delta )\)-ADP guarantee with respect to \(x_0,\,x_1\in \mathcal {X}\) for any \(\epsilon >0\) and \(\delta \) iff \(\delta \) is defined as:
Using the expressions derived for \(\mathbb {P}[\mathcal {M}_{x_0}(x_0)=s]\) and \(\mathbb {P}[\mathcal {M}_{x_0}(x_1)=s]\) in (8) and (9), respectively, to get \(v_s\):
Combining (10) and (11), \(\delta (\epsilon )=\sum \limits _{\begin{array}{c} u: u>\epsilon ; s=0\\ v=\ln {\frac{\mathbb {P}[\mathcal {M}_{x_0}(x_0)=s]}{\mathbb {P}[\mathcal {M}_{x_0}(x_1)=s]}} \end{array}}^n(1-e^{\epsilon -v})\mathbb {P}[\mathcal {M}_{x_0}(x_0)=s]\)
B Theoretical outline
In \(\mathcal {M}\), we extend the idea of ADP to a non-adapted, general DP by using the highest value of \(\delta \) across the primary inputs of every member in \(\mathfrak {U}\), for a fixed \(\epsilon \). This essentially ensures the worst possible tight differential privacy guarantee for the shuffle model. After that, we focus on estimating the original distribution of the primary initial dataset.
Let \(\mathcal {R}_{\text {kRR}}^{-1}\) denote the inverseFootnote 1 of the probabilistic mechanism \(\mathcal {R}_{\text {kRR}}\), which is used as the local randomizer for \(\mathcal {M}\). Note that \(\mathcal {R}_{\text {kRR}}^{-1}\) and \(\mathcal {R}_{\text {kRR}}\) are both \(k\times k\) stochastic channels as \(|\mathcal {X}|=k\). Staying consistent with our previously developed notations, let us, additionally, introduce \(H_{\mathfrak {N}}\) broadcasting the frequencies of the elements in \(\mathcal {X}\) after they have been sanitized with \(\mathfrak {N}\). In other words, \(H_{\mathfrak {N}}=\mathfrak {N}_{\epsilon ,\delta }(D_{\mathcal {X}})=(H_{x_0},\ldots ,H_{x_{k-1}})\), where \(H_{x_i}\) is the random variable giving the frequency of \(x_i\) after \(D_{\mathcal {X}}\) has been obfuscated with \(\mathfrak {N}_{\epsilon ,\delta }\).
Since both \(\mathcal {M}\) and \(\mathfrak {N}\) are probabilistic mechanisms, to estimate their utilities we study how accurately we can estimate the true distribution from which \(D_{\mathcal {X}}\) is sampled, after observing the response of the histogram queries in both the scenarios.
Let \(\pi =(\pi _{x_0},\ldots ,\pi _{x_{k-1}})\) be the distribution of the original messages in \(D(x_0)\). Our best guess of the original distribution by observing the noisy histogram going through the Gaussian mechanism is the noisy histogram itself, as \(\mathbb {E}(H_{x_i})=n\pi _{x_i}\) for every \(i\in \{0,\ldots ,k-1\}\).
However, in the case where \(D(x_0)\) is locally obfuscated using \(\mathcal {R}_{\text {kRR}}\) and the frequency of each element is broadcast by the shuffle model \(\mathcal {M}\), we can use the matrix inversion method [1, 13] to estimate the distribution of the original messages in \(D(x_0)\). So \(\mathcal {M}(D(x_0))\mathcal {R}_{\text {kRR}}^{-1}\) (referred as shuffle+INV in the experiments) should be giving us \(\hat{\pi }=(\hat{\pi }_{x_0},\ldots ,\hat{\pi }_{x_{k-1}})\) – the most likely estimate of the distribution of each user’s message in \(D(x_0)\) sampled from \(\mathcal {X}\) – where \(\hat{\pi }_{x_i}\) denotes the random variable estimating the normalised frequency of \(x_i\) in \(D(x_0)\).
We recall that \(\mathcal {M}\) provides tight \((\epsilon ,\,\delta )\)-ADP for \(x_0,\,x_1\), where \(\delta \) is a function of \(\epsilon _0,\,\epsilon ,\,\text {and }x_0\) – essentially \(\mathcal {M}\) privatizes the true query response for \(x_0\) to be identified as that for any \(x_1\ne x_0\). On the other hand, \(\mathfrak {N}_{\epsilon ,\delta }\) ensures \((\epsilon ,\,\delta )\)-DP, which essentially means it guarantees \((\epsilon ,\,\delta )\)-ADP for every \(x_i\in \mathcal {X}\). Therefore, in order to facilitate a fair comparison of utility between the central and shuffle models of differential privacy under the same privacy level for the histogram query, we introduce the following concepts:
-
i)
Individual specific utility: Suppose the primary input of u is \(x_0\). Individual specific utility refers to measuring the utility for the specific message \(x_0\) in the dataset \(D(x_0)\) in a certain privacy mechanism. In particular, the individual specific utility of \(x_0\) in \(D(x_0)\) for \(\mathcal {M}\) is
$$\begin{aligned} \overline{\mathcal {W}}(\mathcal {M},x_0)=|n\hat{\pi }_{x_0}-n\pi _{x_0}|, \end{aligned}$$and that for \(\mathfrak {N}_{\epsilon ,\delta }\) is
$$\begin{aligned} \overline{\mathcal {W}}(\mathfrak {N}_{\epsilon ,\delta },x_0)=|n\pi _{x_0}-H_{x_0}| \end{aligned}$$ -
ii)
Community level utility: Here we consider the utility privacy mechanisms over the entire community, i.e., all the values of the original dataset, by measuring the distance between the estimated original distribution obtained from the observed noisy histogram and the original distribution of the source messages itself. In particular, fixing any \(\epsilon _0>0\) and \(\epsilon >0\), the community level utility for \(\mathcal {M}\) is
$$\begin{aligned} \mathcal {W}(\mathcal {M})=d(n\hat{\pi },\,n\pi ), \end{aligned}$$(13)and that for \(\mathfrak {\mathfrak {N}}_{\epsilon ,\delta }\)Footnote 2 is
$$\begin{aligned} \mathcal {W}(\mathfrak {N}_{\epsilon ,\delta })=d(H_{\mathfrak {N}_{\epsilon ,\delta }},\,n\pi ), \end{aligned}$$(14)where d(.) is any standard metricFootnote 3 to measure probability distributions over a finite space. For an equitable comparison between \(\mathcal {M}\) and \(\mathfrak {N}\), we take the worst tight ADP guarantee over every user’s primary input and call this the community level tight DP guarantee for \(\mathcal {M}\). That is, for a fixed \(\epsilon _0,\,\epsilon >0\), we have \(\mathcal {M}\) satisfying \((\epsilon ,\,\hat{\delta })\)-DP as the community level tight DP guarantee if
$$\begin{aligned} \hat{\delta }=\max \limits _{x\in \mathcal {X}}\{\delta :\mathcal {M}\text { is tightly }(\epsilon ,\delta (x))\text {-ADP for } x\in D_{\mathcal {X}}\} \end{aligned}$$(15)Therefore, we impose the worst tight ADP guarantee on \(\mathcal {M}\) over all the original messages with \(\epsilon \) and \(\hat{\delta }\), implying that \(\mathcal {M}\) now gives a \((\epsilon ,\,\hat{\delta })\)-DP guarantee by Remark 1, placing us in a position to compare the community level utilities of the shuffle and the central models of DP under the histogram query for a fixed level of privacy. In particular, we juxtapose \(\mathcal {W}(\mathcal {M})\) with \(\mathcal {W}(\mathfrak {N}_{\epsilon ,\hat{\delta }})\), as seen in the experimental results with location data from San Francisco and Paris in Fig. 3.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Biswas, S., Jung, K., Palamidessi, C. (2024). Tight Differential Privacy Guarantees for the Shuffle Model with k-Randomized Response. In: Mosbah, M., Sèdes, F., Tawbi, N., Ahmed, T., Boulahia-Cuppens, N., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2023. Lecture Notes in Computer Science, vol 14551. Springer, Cham. https://doi.org/10.1007/978-3-031-57537-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-57537-2_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57536-5
Online ISBN: 978-3-031-57537-2
eBook Packages: Computer ScienceComputer Science (R0)