research-article

Information Leakage Measures for Imperfect Statistical Information: Application to Non-Bayesian Framework

Authors:

Shahnewaz Karim Sakib,

George T. Amariucai,

Yong GuanAuthors Info & Claims

IEEE Transactions on Information Forensics and Security, Volume 20

Pages 1065 - 1080

https://doi.org/10.1109/TIFS.2024.3516585

Published: 01 January 2025 Publication History

Abstract

This paper analyzes the problem of estimating information leakage when the complete statistics of the privacy mechanism are not known, and the only available information consists of several input-output pairs obtained through interaction with the system or through some side channel. Several metrics, such as subjective leakage, objective leakage, and confidence boost, were introduced before for this purpose, but by design only work in a Bayesian framework. However, it is known that Bayesian inference can quickly become intractable if the domains of the involved variables are large. In this paper, we focus on this exact problem and propose a novel approach to perform an estimation of the leakage measures when the true knowledge of the privacy mechanism is beyond the reach of the user for a non-Bayesian framework using machine learning. Initially, we adapt the definition of leakage metrics to a non-Bayesian framework and derive their statistical bounds, and afterward, we evaluate the performance of those metrics via various experiments using Neural Networks, Random Forest Classifiers, and Support Vector Machines. We have also evaluated their performance on an image dataset to demonstrate the versatility of the metrics. Finally, we provide a comparative analysis between our proposed metrics and the metrics of the Bayesian framework.

References

[1]

M. Warner. (2006). Wal-Mart Eyes Organic Foods. Accessed: Sep. 26, 2023. [Online]. Available: https://www.nytimes.com/2006/05/12/business/12organic.html

[2]

L. Patton and C. Giammona. (2015). The New Organic Walmart is Eating Whole Foods’ Lunch. Accessed: Sep. 9, 2023. [Online]. Available: https://www.bloomberg.com/news/articles/2015-05-14/whole-foods-walmart-costco-steal-growth-in-organic-groceries

[3]

D. Gunduz, E. Erkip, and H. V. Poor, “Lossless compression with security constraints,” in Proc. IEEE Int. Symp. Inf. Theory, Jul. 2008, pp. 111–115.

[4]

D. Rebollo-Monedero, J. Forne, and J. Domingo-Ferrer, “From t-closeness-like privacy to postrandomization via information theory,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 11, pp. 1623–1636, Nov. 2010.

Digital Library

[5]

S. Li, A. Khisti, and A. Mahajan, “Information-theoretic privacy for smart metering systems with a rechargeable battery,” IEEE Trans. Inf. Theory, vol. 64, no. 5, pp. 3679–3695, May 2018.

[6]

I. Issa, A. B. Wagner, and S. Kamath, “An operational approach to information leakage,” IEEE Trans. Inf. Theory, vol. 66, no. 3, pp. 1625–1657, Mar. 2020.

[7]

H. Wang, M. Diaz, F. P. Calmon, and L. Sankar, “The utility cost of robust privacy guarantees,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Jun. 2018, pp. 706–710.

[8]

H. Wang, L. Vo, F. P. Calmon, M. Médard, K. R. Duffy, and M. Varia, “Privacy with estimation guarantees,” IEEE Trans. Inf. Theory, vol. 65, no. 12, pp. 8025–8042, Dec. 2019.

Digital Library

[9]

S. K. Sakib, G. T. Amariucai, and Y. Guan, “Information leakage metrics for adversaries with incomplete information: Binary privacy mechanism,” in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2021, pp. 1–7.

[10]

S. K. Sakib, G. T. Amariucai, and Y. Guan, “Measures of information leakage for incomplete statistical information: Application to a binary privacy mechanism,” ACM Trans. Privacy Secur., vol. 26, no. 4, pp. 1–31, Nov. 2023.

Digital Library

[11]

S. K. Sakib, G. T. Amariucai, and Y. Guan, “Variations and extensions of information leakage metrics with applications to privacy problems with imperfect statistical information,” in Proc. IEEE 36th Comput. Secur. Found. Symp. (CSF), Jul. 2023, pp. 407–422.

[12]

M. Romanelli, K. Chatzikokolakis, C. Palamidessi, and P. Piantanida, “Estimating g-leakage via machine learning,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., Oct. 2020, pp. 697–716.

[13]

G. Cherubin, K. Chatzikokolakis, and C. Palamidessi, “F-BLEAU: Fast black-box leakage estimation,” in Proc. IEEE Symp. Secur. Privacy (SP), May 2019, pp. 835–852.

[14]

M. Romanelli, K. Chatzikokolakis, and C. Palamidessi, “Optimal obfuscation mechanisms via machine learning,” 2019, arXiv:1904.01059.

[15]

C. Palamidessi and M. Romanelli, “Modern applications of game-theoretic principles,” in Proc. CONCUR-31st Int. Conf. Concurrency Theory, vol. 171, Sep. 2020, pp. 1–4.

[16]

P. Gupta, M. Wever, and E. Hüllermeier, “Information leakage detection through approximate bayes-optimal prediction,” 2024, arXiv:2401.14283.

[17]

M. I. Belghazi et al., “Mutual information neural estimation,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 531–540.

[18]

Z. Qin, D. Kim, and T. Gedeon, “Rethinking softmax with cross-entropy: Neural network classifier as mutual information estimator,” 2019, arXiv:1911.10688.

[19]

F. M. Polo and R. Vicente, “Effective sample size, dimensionality, and generalization in covariate shift adaptation,” Neural Comput. Appl., vol. 35, no. 25, pp. 18187–18199, Sep. 2023.

[20]

K. R. Moon, K. Sricharan, and A. O. Hero, “Ensemble estimation of generalized mutual information with applications to genomics,” IEEE Trans. Inf. Theory, vol. 67, no. 9, pp. 5963–5996, Sep. 2021.

Digital Library

[21]

F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, Nov. 2011.

Digital Library

[22]

J. Gonçalves. Can You Trust Your Model’s Probabilities? (Part I). Accessed: Aug. 17, 2022. [Online]. Available: https://engineering.talkdesk.com/can-you-trust-your-models-probabilities-part-i-50354f05bea3

[23]

D. Martin. Are You Sure That’s a Probability?. Accessed: Aug. 17, 2022. [Online]. Available: https://kiwidamien.github.io/are-you-sure-thats-a-probability.html

[24]

scikit-learn 1.1.2. 1.16. Probability Calibration. Accessed: Aug. 17, 2022. [Online]. Available: https://scikit-learn.org/stable/modules/calibration.html

[25]

C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in Proc. 34th Intl. Conf. Mach. Learn., 2017, pp. 1321–1330.

[26]

J. Platt, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” Adv. Large Margin Classifiers, vol. 10, no. 3 pp. 61–74, Mar. 1999.

[27]

S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence. London, U.K.: Oxford Univ. Press, Feb. 2013. 10.1093/acprof:oso/9780199535255.001.0001.

[28]

N. R. Hamdi, R. F. Krueger, and S. C. South, “Socioeconomic status moderates genetic and environmental effects on the amount of alcohol use,” Alcoholism, Clin. Experim. Res., vol. 39, no. 4, pp. 603–610, Apr. 2015.

[29]

A. Patel. (2021). Customer Personality Analysis. Accessed: Jul. 11, 2022. [Online]. Available: https://www.kaggle.com/datasets/imakash3011/customer-personality-analysis

[30]

C. Sharma, B. Mandal, and G. Amariucai, “A practical approach to navigating the tradeoff between privacy and precise utility,” in Proc. IEEE Int. Conf. Commun., Jun. 2021, pp. 1–6.

[31]

I. M. Chakravarti, R. G. Laha, and J. Roy, Handbook of Methods of Applied Statistics (Wiley Series in Probability and Mathematical Statistics (USA)). Hoboken, NJ, USA: Wiley, 1967.

[32]

M. Feurer and F. Hutter, “Hyperparameter optimization,” in Automated Machine Learning. Cham, Switzerland: Springer, 2019, pp. 3–33.

[33]

K. Karkkainen and J. Joo, “FairFace: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2021, pp. 1548–1558.

[34]

P. Korshunov and T. Ebrahimi, “Using face morphing to protect privacy,” in Proc. 10th IEEE Int. Conf. Adv. Video Signal Based Surveill., Aug. 2013, pp. 208–213.

[35]

N. Ruchaud, G. Antipov, P. Korshunov, J.-L. Dugelay, T. Ebrahimi, and S.-A. Berrani, “The impact of privacy protection filters on gender recognition,” Proc. SPIE, vol. 9599, pp. 36–47, 2015.

[36]

A. Wang. (2020). Face Morphing. [Online]. Available: https://github.com/Azmarie/Face-Morphing

[37]

UM Learn. Iris Species. Accessed: Mar. 21, 2022. [Online]. Available: https://www.kaggle.com/uciml/iris

[38]

L. Sankar, S. R. Rajagopalan, and H. V. Poor, “Utility-privacy tradeoffs in databases: An information-theoretic approach,” IEEE Trans. Inf. Forensics Security, vol. 8, no. 6, pp. 838–852, Jun. 2013.

Digital Library

[39]

P. Czyz, F. Grabowski, J. E. Vogt, N. Beerenwinkel, and A. Marx, “Beyond normal: On the evaluation of mutual information estimators,” in Proc. Adv. Neural Inf. Process. Syst., Jan. 2023, pp. 16957–16990.

[40]

M. S. Alvim, M. E. Andres, K. Chatzikokolakis, P. Degano, and C. Palamidessi, “Differential privacy: On the trade-off between utility and information leakage,” in Proc. Int. Workshop Formal Aspects Secur. Trust. Cham, Switzerland: Springer, 2011, pp. 39–54.

[41]

G. Smith, “On the foundations of quantitative information flow,” in Proc. Int. Conf. Found. Softw. Sci. Comput. Struct. Cham, Switzerland: Springer 2009, pp. 288–302.

[42]

C. Braun, K. Chatzikokolakis, and C. Palamidessi, “Quantitative notions of leakage for one-try attacks,” Electron. Notes Theor. Comput. Sci., vol. 249, pp. 75–91, Aug. 2009.

Digital Library

[43]

A. Rényi, “On measures of entropy and information,” in Proc. 4th Berkeley Symp. Math. Statist. Probab., vol. 1. Berkeley, CA, USA: Univ. California Press, 1961, pp. 547–561.

[44]

M. S. Alvim, K. Chatzikokolakis, C. Palamidessi, and G. Smith, “Measuring information leakage using generalized gain functions,” in Proc. IEEE 25th Comput. Secur. Found. Symp., Jun. 2012, pp. 265–279.

[45]

M. S. Alvim, K. Chatzikokolakis, A. McIver, C. Morgan, C. Palamidessi, and G. Smith, “Axioms for information leakage,” in Proc. IEEE 29th Comput. Secur. Found. Symp. (CSF), Jun. 2016, pp. 77–92.

[46]

K. Chatzikokolakis, C. Palamidessi, and P. Panangaden, “Probability of error in information-hiding protocols,” in Proc. 20th IEEE Comput. Secur. Found. Symp. (CSF), Jul. 2007, pp. 341–354.

[47]

M. Hellman and J. Raviv, “Probability of error, equivocation, and the Chernoff bound,” IEEE Trans. Inf. Theory, vol. IT-16, no. 4, pp. 368–372, Jul. 1970.

[48]

D. Tebbe and S. Dwyer, “Uncertainty and the probability of error (Corresp.),” IEEE Trans. Inf. Theory, vol. IT-14, no. 3, pp. 516–518, May 1968.

[49]

R. M. Fano, Transmission of Information: A Statistical Theory of Communications. Cambridge, MA, USA: MIT Press, 1968.

[50]

C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Proc. Theory Cryptogr. Conf. Cham, Switzerland: Springer, 2006, pp. 265–284.

[51]

D. Desfontaines and B. Pejó, “SoK: Differential privacies,” Proc. Privacy Enhancing Technol., vol. 2020, no. 2, pp. 288–313, Apr. 2020.

[52]

A. Smith, A. Thakurta, and J. Upadhyay, “Is interaction necessary for distributed private learning?,” in Proc. IEEE Symp. Secur. Privacy (SP), May 2017, pp. 58–77.

[53]

I. Issa, S. Kamath, and A. B. Wagner, “An operational measure of information leakage,” in Proc. Annu. Conf. Inf. Sci. Syst. (CISS), 2016, pp. 234–239.

[54]

M. Jegorova et al., “Survey: Leakage and privacy at inference time,” 2021, arXiv:2107.01614.

[55]

M. R. Clarkson, A. C. Myers, and F. B. Schneider, “Quantifying information flow with beliefs,” J. Comput. Secur., vol. 17, no. 5, pp. 655–701, Oct. 2009.

Digital Library

[56]

M. R. Clarkson and F. B. Schneider, “Quantification of integrity,” Math. Struct. Comput. Sci., vol. 25, no. 2, pp. 207–258, Nov. 2014.

[57]

S. Hamadou, C. Palamidessi, and V. Sassone, “Quantifying leakage in the presence of unreliable sources of information,” J. Comput. Syst. Sci., vol. 88, pp. 27–52, Sep. 2017.

[58]

K. Chatzikokolakis, T. Chothia, and A. Guha, “Statistical measurement of information leakage,” in Proc. Int. Conf. Tools Algorithms Construct. Anal. Syst. Cham, Switzerland: Springer, Jan. 2010, pp. 390–404.

[59]

S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Mach. Learn., vol. 79, no. 1, pp. 151–175, 2010.

Digital Library

[60]

C. L. Canonne, “Topics and techniques in distribution testing: A biased but representative sample,” Found. Trends Commun. Inf. Theory, vol. 19, no. 6, pp. 1032–1198, 2022.

Digital Library

[61]

F. Granese, M. Romanelli, D. Gorla, C. Palamidessi, and P. Piantanida, “DOCTOR: A simple method for detecting misclassification errors,” in Proc. Adv. Neural Inf. Process. Syst., Jan. 2021, pp. 5669–5681.

[62]

S. Liang, Y. Li, and R. Srikant, “Enhancing the reliability of out-of-distribution image detection in neural networks,” 2017, arXiv:1706.02690.

[63]

G. Pichler, M. Romanelli, D. P. Manivannan, P. Krishnamurthy, and S. Garg, “On the (in) feasibility of ML backdoor detection as an hypothesis testing problem,” in Proc. Int. Conf. Artif. Intell. Statist., 2024, pp. 4051–4059.

[64]

F. Granese, M. Romanelli, and P. Piantanida, “Optimal zero-shot detector for multi-armed attacks,” in Proc. Int. Conf. Artif. Intell. Statist., Feb. 2024, pp. 2467–2475.

Index Terms

Information Leakage Measures for Imperfect Statistical Information: Application to Non-Bayesian Framework
1. Security and privacy
2. Social and professional topics
  1. Computing / technology policy
    1. Computer crime
    2. Privacy policies

Index terms have been assigned to the content through auto-classification.

Recommendations

Measures of Information Leakage for Incomplete Statistical Information: Application to a Binary Privacy Mechanism
Information leakage is usually defined as the logarithmic increment in the adversary’s probability of correctly guessing the legitimate user’s private data or some arbitrary function of the private data when presented with the legitimate user’s publicly ...
A Bayesian Framework of Information Theoretic Metrics for Anonymity
CIT '14: Proceedings of the 2014 IEEE International Conference on Computer and Information Technology

Information theoretic metric is popular theory to measure anonymity. However the difficulty in getting the probability distribution of anonymous subjects hampers its practical usage. In this paper we propose an analytic framework based on Bayesian ...
A widely applicable Bayesian information criterion

A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Information Forensics and Security

IEEE Transactions on Information Forensics and Security Volume 20, Issue

2025

1935 pages

Issue’s Table of Contents

1556-6021 © 2024 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 January 2025

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents