Search | arXiv e-print repository

Robust Distribution Learning with Local and Global Adversarial Corruptions

Authors: Sloan Nietert, Ziv Goldfeld, Soroosh Shafiee

Abstract: We consider learning in an adversarial environment, where an $\varepsilon$-fraction of samples from a distribution $P$ are arbitrarily modified (global corruptions) and the remaining perturbations have average magnitude bounded by $ρ$ (local corruptions). Given access to $n$ such corrupted samples, we seek a computationally efficient estimator $\hat{P}_n$ that minimizes the Wasserstein distance… ▽ More We consider learning in an adversarial environment, where an $\varepsilon$-fraction of samples from a distribution $P$ are arbitrarily modified (global corruptions) and the remaining perturbations have average magnitude bounded by $ρ$ (local corruptions). Given access to $n$ such corrupted samples, we seek a computationally efficient estimator $\hat{P}_n$ that minimizes the Wasserstein distance $\mathsf{W}_1(\hat{P}_n,P)$. In fact, we attack the fine-grained task of minimizing $\mathsf{W}_1(Π_\# \hat{P}_n, Π_\# P)$ for all orthogonal projections $Π\in \mathbb{R}^{d \times d}$, with performance scaling with $\mathrm{rank}(Π) = k$. This allows us to account simultaneously for mean estimation ($k=1$), distribution estimation ($k=d$), as well as the settings interpolating between these two extremes. We characterize the optimal population-limit risk for this task and then develop an efficient finite-sample algorithm with error bounded by $\sqrt{\varepsilon k} + ρ+ \tilde{O}(d\sqrt{k}n^{-1/(k \lor 2)})$ when $P$ has bounded covariance. This guarantee holds uniformly in $k$ and is minimax optimal up to the sub-optimality of the plug-in estimator when $ρ= \varepsilon = 0$. Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator. We apply this algorithm to robust stochastic optimization, and, in the process, uncover a new method for overcoming the curse of dimensionality in Wasserstein distributionally robust optimization. △ Less

Submitted 24 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2024

arXiv:2404.03176 [pdf, other]

Information-Theoretic Generalization Bounds for Deep Neural Networks

Authors: Haiyun He, Christina Lee Yu, Ziv Goldfeld

Abstract: Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds. We first derive two hierarchical bounds on the generalization error in terms of the Kullback-Leibler (KL) divergence or the 1-Wasserstein distance between the tra… ▽ More Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds. We first derive two hierarchical bounds on the generalization error in terms of the Kullback-Leibler (KL) divergence or the 1-Wasserstein distance between the train and test distributions of the network internal representations. The KL divergence bound shrinks as the layer index increases, while the Wasserstein bound implies the existence of a layer that serves as a generalization funnel, which attains a minimal 1-Wasserstein distance. Analytic expressions for both bounds are derived under the setting of binary Gaussian classification with linear DNNs. To quantify the contraction of the relevant information measures when moving deeper into the network, we analyze the strong data processing inequality (SDPI) coefficient between consecutive layers of three regularized DNN models: Dropout, DropConnect, and Gaussian noise injection. This enables refining our generalization bounds to capture the contraction as a function of the network architecture parameters. Specializing our results to DNNs with a finite parameter space and the Gibbs algorithm reveals that deeper yet narrower network architectures generalize better in those examples, although how broadly this statement applies remains a question. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 25 pages, 5 figures

arXiv:2311.05573 [pdf, other]

Outlier-Robust Wasserstein DRO

Authors: Sloan Nietert, Ziv Goldfeld, Soroosh Shafiee

Abstract: Distributionally robust optimization (DRO) is an effective approach for data-driven decision-making in the presence of uncertainty. Geometric uncertainty due to sampling or localized perturbations of data points is captured by Wasserstein DRO (WDRO), which seeks to learn a model that performs uniformly well over a Wasserstein ball centered around the observed data distribution. However, WDRO fails… ▽ More Distributionally robust optimization (DRO) is an effective approach for data-driven decision-making in the presence of uncertainty. Geometric uncertainty due to sampling or localized perturbations of data points is captured by Wasserstein DRO (WDRO), which seeks to learn a model that performs uniformly well over a Wasserstein ball centered around the observed data distribution. However, WDRO fails to account for non-geometric perturbations such as adversarial outliers, which can greatly distort the Wasserstein distance measurement and impede the learned model. We address this gap by proposing a novel outlier-robust WDRO framework for decision-making under both geometric (Wasserstein) perturbations and non-geometric (total variation (TV)) contamination that allows an $\varepsilon$-fraction of data to be arbitrarily corrupted. We design an uncertainty set using a certain robust Wasserstein ball that accounts for both perturbation types and derive minimax optimal excess risk bounds for this procedure that explicitly capture the Wasserstein and TV risks. We prove a strong duality result that enables tractable convex reformulations and efficient computation of our outlier-robust WDRO problem. When the loss function depends only on low-dimensional features of the data, we eliminate certain dimension dependencies from the risk bounds that are unavoidable in the general setting. Finally, we present experiments validating our theory on standard regression and classification tasks. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: Appearing at NeurIPS 2023

arXiv:2309.16200 [pdf, other]

Max-Sliced Mutual Information

Authors: Dor Tsur, Ziv Goldfeld, Kristjan Greenewald

Abstract: Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA on… ▽ More Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual information between low-dimensional projections of the high-dimensional variables, which reduces back to CCA in the Gaussian case. It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence. We then study statistical estimation of mSMI, propose an efficiently computable neural estimator, and couple it with formal non-asymptotic error bounds. We present experiments that demonstrate the utility of mSMI for several tasks, encompassing independence testing, multi-view representation learning, algorithmic fairness, and generative modeling. We observe that mSMI consistently outperforms competing methods with little-to-no computational overhead. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: Accepted at NeurIPS 2023

arXiv:2307.01171 [pdf, other]

doi 10.1103/PhysRevA.109.032431

Quantum Neural Estimation of Entropies

Authors: Ziv Goldfeld, Dhrumil Patel, Sreejith Sreekumar, Mark M. Wilde

Abstract: Entropy measures quantify the amount of information and correlation present in a quantum system. In practice, when the quantum state is unknown and only copies thereof are available, one must resort to the estimation of such entropy measures. Here we propose a variational quantum algorithm for estimating the von Neumann and Rényi entropies, as well as the measured relative entropy and measured Rén… ▽ More Entropy measures quantify the amount of information and correlation present in a quantum system. In practice, when the quantum state is unknown and only copies thereof are available, one must resort to the estimation of such entropy measures. Here we propose a variational quantum algorithm for estimating the von Neumann and Rényi entropies, as well as the measured relative entropy and measured Rényi relative entropy. Our approach first parameterizes a variational formula for the measure of interest by a quantum circuit and a classical neural network, and then optimizes the resulting objective over parameter space. Numerical simulations of our quantum algorithm are provided, using a noiseless quantum simulator. The algorithm provides accurate estimates of the various entropy measures for the examples tested, which renders it as a promising approach for usage in downstream tasks. △ Less

Submitted 5 February, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: 14 pages, 2 figures; see also independent works of Shin, Lee, and Jeong at arXiv:2306.14566v1 and Lee, Kwon, and Lee at arXiv:2307.13511v2

Journal ref: Physical Review A, vol. 109, no. 3, page 032431, March 2024

arXiv:2306.13054 [pdf, other]

doi 10.1109/TIT.2024.3404927

Quantum Pufferfish Privacy: A Flexible Privacy Framework for Quantum Systems

Authors: Theshani Nuradha, Ziv Goldfeld, Mark M. Wilde

Abstract: We propose a versatile privacy framework for quantum systems, termed quantum pufferfish privacy (QPP). Inspired by classical pufferfish privacy, our formulation generalizes and addresses limitations of quantum differential privacy by offering flexibility in specifying private information, feasible measurements, and domain knowledge. We show that QPP can be equivalently formulated in terms of the D… ▽ More We propose a versatile privacy framework for quantum systems, termed quantum pufferfish privacy (QPP). Inspired by classical pufferfish privacy, our formulation generalizes and addresses limitations of quantum differential privacy by offering flexibility in specifying private information, feasible measurements, and domain knowledge. We show that QPP can be equivalently formulated in terms of the Datta-Leditzky information spectrum divergence, thus providing the first operational interpretation thereof. We reformulate this divergence as a semi-definite program and derive several properties of it, which are then used to prove convexity, composability, and post-processing of QPP mechanisms. Parameters that guarantee QPP of the depolarization mechanism are also derived. We analyze the privacy-utility tradeoff of general QPP mechanisms and, again, study the depolarization mechanism as an explicit instance. The QPP framework is then applied to privacy auditing for identifying privacy violations via a hypothesis testing pipeline that leverages quantum algorithms. Connections to quantum fairness and other quantum divergences are also explored and several variants of QPP are examined. △ Less

Submitted 28 May, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: v2: 33 pages, 9 figures, accepted to IEEE Transactions on Information Theory

Journal ref: IEEE Transactions on Information Theory, vol. 70, no. 8, pp. 5731-5762, Aug. 2024

arXiv:2302.01237 [pdf, other]

Robust Estimation under the Wasserstein Distance

Authors: Sloan Nietert, Rachel Cummings, Ziv Goldfeld

Abstract: We study the problem of robust distribution estimation under the Wasserstein metric, a popular discrepancy measure between probability distributions rooted in optimal transport (OT) theory. We introduce a new outlier-robust Wasserstein distance $\mathsf{W}_p^\varepsilon$ which allows for $\varepsilon$ outlier mass to be removed from its input distributions, and show that minimum distance estimatio… ▽ More We study the problem of robust distribution estimation under the Wasserstein metric, a popular discrepancy measure between probability distributions rooted in optimal transport (OT) theory. We introduce a new outlier-robust Wasserstein distance $\mathsf{W}_p^\varepsilon$ which allows for $\varepsilon$ outlier mass to be removed from its input distributions, and show that minimum distance estimation under $\mathsf{W}_p^\varepsilon$ achieves minimax optimal robust estimation risk. Our analysis is rooted in several new results for partial OT, including an approximate triangle inequality, which may be of independent interest. To address computational tractability, we derive a dual formulation for $\mathsf{W}_p^\varepsilon$ that adds a simple penalty term to the classic Kantorovich dual objective. As such, $\mathsf{W}_p^\varepsilon$ can be implemented via an elementary modification to standard, duality-based OT solvers. Our results are extended to sliced OT, where distributions are projected onto low-dimensional subspaces, and applications to homogeneity and independence testing are explored. We illustrate the virtues of our framework via applications to generative modeling with contaminated datasets. △ Less

Submitted 2 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2111.01361

arXiv:2301.00621 [pdf, ps, other]

Data-Driven Optimization of Directed Information over Discrete Alphabets

Authors: Dor Tsur, Ziv Aharoni, Ziv Goldfeld, Haim Permuter

Abstract: Directed information (DI) is a fundamental measure for the study and analysis of sequential stochastic models. In particular, when optimized over input distributions it characterizes the capacity of general communication channels. However, analytic computation of DI is typically intractable and existing optimization techniques over discrete input alphabets require knowledge of the channel model, w… ▽ More Directed information (DI) is a fundamental measure for the study and analysis of sequential stochastic models. In particular, when optimized over input distributions it characterizes the capacity of general communication channels. However, analytic computation of DI is typically intractable and existing optimization techniques over discrete input alphabets require knowledge of the channel model, which renders them inapplicable when only samples are available. To overcome these limitations, we propose a novel estimation-optimization framework for DI over discrete input spaces. We formulate DI optimization as a Markov decision process and leverage reinforcement learning techniques to optimize a deep generative model of the input process probability mass function (PMF). Combining this optimizer with the recently developed DI neural estimator, we obtain an end-to-end estimation-optimization algorithm which is applied to estimating the (feedforward and feedback) capacity of various discrete channels with memory. Furthermore, we demonstrate how to use the optimized PMF model to (i) obtain theoretical bounds on the feedback capacity of unifilar finite-state channels; and (ii) perform probabilistic shaping of constellations in the peak power-constrained additive white Gaussian noise channel. △ Less

Submitted 2 January, 2023; originally announced January 2023.

arXiv:2211.11184 [pdf, ps, other]

Limit distribution theory for $f$-Divergences

Authors: Sreejith Sreekumar, Ziv Goldfeld, Kengo Kato

Abstract: $f$-divergences, which quantify discrepancy between probability distributions, are ubiquitous in information theory, machine learning, and statistics. While there are numerous methods for estimating $f… ▽ More $f$-divergences, which quantify discrepancy between probability distributions, are ubiquitous in information theory, machine learning, and statistics. While there are numerous methods for estimating $f$-divergences from data, a limit distribution theory, which quantifies fluctuations of the estimation error, is largely obscure. As limit theorems are pivotal for valid statistical inference, to close this gap, we develop a general methodology for deriving distributional limits for $f$-divergences based on the functional delta method and Hadamard directional differentiability. Focusing on four prominent $f$-divergences -- Kullback-Leibler divergence, $χ^2$ divergence, squared Hellinger distance, and total variation distance -- we identify sufficient conditions on the population distributions for the existence of distributional limits and characterize the limiting variables. These results are used to derive one- and two-sample limit theorems for Gaussian-smoothed $f$-divergences, both under the null and the alternative. Finally, an application of the limit distribution theory to auditing differential privacy is proposed and analyzed for significance level and power against local alternatives. △ Less

Submitted 12 October, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

arXiv:2210.12612 [pdf, ps, other]

doi 10.1109/TIT.2023.3296288

Pufferfish Privacy: An Information-Theoretic Study

Authors: Theshani Nuradha, Ziv Goldfeld

Abstract: Pufferfish privacy (PP) is a generalization of differential privacy (DP), that offers flexibility in specifying sensitive information and integrates domain knowledge into the privacy definition. Inspired by the illuminating formulation of DP in terms of mutual information due to Cuff and Yu, this work explores PP through the lens of information theory. We provide an information-theoretic formulati… ▽ More Pufferfish privacy (PP) is a generalization of differential privacy (DP), that offers flexibility in specifying sensitive information and integrates domain knowledge into the privacy definition. Inspired by the illuminating formulation of DP in terms of mutual information due to Cuff and Yu, this work explores PP through the lens of information theory. We provide an information-theoretic formulation of PP, termed mutual information PP (MI PP), in terms of the conditional mutual information between the mechanism and the secret, given the public information. We show that MI PP is implied by the regular PP and characterize conditions under which the reverse implication is also true, recovering the relationship between DP and its information-theoretic variant as a special case. We establish convexity, composability, and post-processing properties for MI PP mechanisms and derive noise levels for the Gaussian and Laplace mechanisms. The obtained mechanisms are applicable under relaxed assumptions and provide improved noise levels in some regimes. Lastly, applications to auditing privacy frameworks, statistical inference tasks, and algorithm stability are explored. △ Less

Submitted 3 May, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

Journal ref: IEEE Transactions on Information Theory, vol. 69, no. 11, pp. 7336-7356, Nov. 2023

arXiv:2210.09160 [pdf, other]

Statistical, Robustness, and Computational Guarantees for Sliced Wasserstein Distances

Authors: Sloan Nietert, Ritwik Sadhu, Ziv Goldfeld, Kengo Kato

Abstract: Sliced Wasserstein distances preserve properties of classic Wasserstein distances while being more scalable for computation and estimation in high dimensions. The goal of this work is to quantify this scalability from three key aspects: (i) empirical convergence rates; (ii) robustness to data contamination; and (iii) efficient computational methods. For empirical convergence, we derive fast rates… ▽ More Sliced Wasserstein distances preserve properties of classic Wasserstein distances while being more scalable for computation and estimation in high dimensions. The goal of this work is to quantify this scalability from three key aspects: (i) empirical convergence rates; (ii) robustness to data contamination; and (iii) efficient computational methods. For empirical convergence, we derive fast rates with explicit dependence of constants on dimension, subject to log-concavity of the population distributions. For robustness, we characterize minimax optimal, dimension-free robust estimation risks, and show an equivalence between robust sliced 1-Wasserstein estimation and robust mean estimation. This enables lifting statistical and algorithmic guarantees available for the latter to the sliced 1-Wasserstein setting. Moving on to computational aspects, we analyze the Monte Carlo estimator for the average-sliced distance, demonstrating that larger dimension can result in faster convergence of the numerical integration error. For the max-sliced distance, we focus on a subgradient-based local optimization algorithm that is frequently used in practice, albeit without formal guarantees, and establish an $O(ε^{-4})$ computational complexity bound for it. Our theory is validated by numerical experiments, which altogether provide a comprehensive quantitative account of the scalability question. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2206.08526 [pdf, other]

k-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension

Authors: Ziv Goldfeld, Kristjan Greenewald, Theshani Nuradha, Galen Reeves

Abstract: Sliced mutual information (SMI) is defined as an average of mutual information (MI) terms between one-dimensional random projections of the random variables. It serves as a surrogate measure of dependence to classic MI that preserves many of its properties but is more scalable to high dimensions. However, a quantitative characterization of how SMI itself and estimation rates thereof depend on the… ▽ More Sliced mutual information (SMI) is defined as an average of mutual information (MI) terms between one-dimensional random projections of the random variables. It serves as a surrogate measure of dependence to classic MI that preserves many of its properties but is more scalable to high dimensions. However, a quantitative characterization of how SMI itself and estimation rates thereof depend on the ambient dimension, which is crucial to the understanding of scalability, remain obscure. This work provides a multifaceted account of the dependence of SMI on dimension, under a broader framework termed $k$-SMI, which considers projections to $k$-dimensional subspaces. Using a new result on the continuity of differential entropy in the 2-Wasserstein metric, we derive sharp bounds on the error of Monte Carlo (MC)-based estimates of $k$-SMI, with explicit dependence on $k$ and the ambient dimension, revealing their interplay with the number of samples. We then combine the MC integrator with the neural estimation framework to provide an end-to-end $k$-SMI estimator, for which optimal convergence rates are established. We also explore asymptotics of the population $k$-SMI as dimension grows, providing Gaussian approximation results with a residual that decays under appropriate moment bounds. All our results trivially apply to SMI by setting $k=1$. Our theory is validated with numerical experiments and is applied to sliced InfoGAN, which altogether provide a comprehensive quantitative account of the scalability question of $k$-SMI, including SMI as a special case when $k=1$. △ Less

Submitted 14 October, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2203.14743 [pdf, ps, other]

Neural Estimation and Optimization of Directed Information over Continuous Spaces

Authors: Dor Tsur, Ziv Aharoni, Ziv Goldfeld, Haim Permuter

Abstract: This work develops a new method for estimating and optimizing the directed information rate between two jointly stationary and ergodic stochastic processes. Building upon recent advances in machine learning, we propose a recurrent neural network (RNN)-based estimator which is optimized via gradient ascent over the RNN parameters. The estimator does not require prior knowledge of the underlying joi… ▽ More This work develops a new method for estimating and optimizing the directed information rate between two jointly stationary and ergodic stochastic processes. Building upon recent advances in machine learning, we propose a recurrent neural network (RNN)-based estimator which is optimized via gradient ascent over the RNN parameters. The estimator does not require prior knowledge of the underlying joint and marginal distributions. The estimator is also readily optimized over continuous input processes realized by a deep generative model. We prove consistency of the proposed estimation and optimization methods and combine them to obtain end-to-end performance guarantees. Applications for channel capacity estimation of continuous channels with memory are explored, and empirical results demonstrating the scalability and accuracy of our method are provided. When the channel is memoryless, we investigate the mapping learned by the optimized input generator. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: 38 pages, 6 figures

arXiv:2111.11328 [pdf, other]

Cycle Consistent Probability Divergences Across Different Spaces

Authors: Zhengxin Zhang, Youssef Mroueh, Ziv Goldfeld, Bharath K. Sriperumbudur

Abstract: Discrepancy measures between probability distributions are at the core of statistical inference and machine learning. In many applications, distributions of interest are supported on different spaces, and yet a meaningful correspondence between data points is desired. Motivated to explicitly encode consistent bidirectional maps into the discrepancy measure, this work proposes a novel unbalanced Mo… ▽ More Discrepancy measures between probability distributions are at the core of statistical inference and machine learning. In many applications, distributions of interest are supported on different spaces, and yet a meaningful correspondence between data points is desired. Motivated to explicitly encode consistent bidirectional maps into the discrepancy measure, this work proposes a novel unbalanced Monge optimal transport formulation for matching, up to isometries, distributions on different spaces. Our formulation arises as a principled relaxation of the Gromov-Haussdroff distance between metric spaces, and employs two cycle-consistent maps that push forward each distribution onto the other. We study structural properties of the proposed discrepancy and, in particular, show that it captures the popular cycle-consistent generative adversarial network (GAN) framework as a special case, thereby providing the theory to explain it. Motivated by computational efficiency, we then kernelize the discrepancy and restrict the mappings to parametric function classes. The resulting kernelized version is coined the generalized maximum mean discrepancy (GMMD). Convergence rates for empirical estimation of GMMD are studied and experiments to support our theory are provided. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: 35 pages

arXiv:2111.01361 [pdf, other]

Outlier-Robust Optimal Transport: Duality, Structure, and Statistical Analysis

Authors: Sloan Nietert, Rachel Cummings, Ziv Goldfeld

Abstract: The Wasserstein distance, rooted in optimal transport (OT) theory, is a popular discrepancy measure between probability distributions with various applications to statistics and machine learning. Despite their rich structure and demonstrated utility, Wasserstein distances are sensitive to outliers in the considered distributions, which hinders applicability in practice. We propose a new outlier-ro… ▽ More The Wasserstein distance, rooted in optimal transport (OT) theory, is a popular discrepancy measure between probability distributions with various applications to statistics and machine learning. Despite their rich structure and demonstrated utility, Wasserstein distances are sensitive to outliers in the considered distributions, which hinders applicability in practice. We propose a new outlier-robust Wasserstein distance $\mathsf{W}_p^\varepsilon$ which allows for $\varepsilon$ outlier mass to be removed from each contaminated distribution. Under standard moment assumptions, $\mathsf{W}_p^\varepsilon$ is shown to achieve strong robust estimation guarantees under the Huber $\varepsilon$-contamination model. Our formulation of this robust distance amounts to a highly regular optimization problem that lends itself better for analysis compared to previously considered frameworks. Leveraging this, we conduct a thorough theoretical study of $\mathsf{W}_p^\varepsilon$, encompassing robustness guarantees, characterization of optimal perturbations, regularity, duality, and statistical estimation. In particular, by decoupling the optimization variables, we arrive at a simple dual form for $\mathsf{W}_p^\varepsilon$ that can be implemented via an elementary modification to standard, duality-based OT solvers. We illustrate the virtues of our framework via applications to generative modeling with contaminated datasets. △ Less

Submitted 28 February, 2023; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: updated to match AISTATS publication

arXiv:2110.05279 [pdf, ps, other]

Sliced Mutual Information: A Scalable Measure of Statistical Dependence

Authors: Ziv Goldfeld, Kristjan Greenewald

Abstract: Mutual information (MI) is a fundamental measure of statistical dependence, with a myriad of applications to information theory, statistics, and machine learning. While it possesses many desirable structural properties, the estimation of high-dimensional MI from samples suffers from the curse of dimensionality. Motivated by statistical scalability to high dimensions, this paper proposes sliced MI… ▽ More Mutual information (MI) is a fundamental measure of statistical dependence, with a myriad of applications to information theory, statistics, and machine learning. While it possesses many desirable structural properties, the estimation of high-dimensional MI from samples suffers from the curse of dimensionality. Motivated by statistical scalability to high dimensions, this paper proposes sliced MI (SMI) as a surrogate measure of dependence. SMI is defined as an average of MI terms between one-dimensional random projections. We show that it preserves many of the structural properties of classic MI, while gaining scalable computation and efficient estimation from samples. Furthermore, and in contrast to classic MI, SMI can grow as a result of deterministic transformations. This enables leveraging SMI for feature extraction by optimizing it over processing functions of raw data to identify useful representations thereof. Our theory is supported by numerical studies of independence testing and feature extraction, which demonstrate the potential gains SMI offers over classic MI for high-dimensional inference. △ Less

Submitted 18 October, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

arXiv:2004.14941 [pdf, other]

The Information Bottleneck Problem and Its Applications in Machine Learning

Authors: Ziv Goldfeld, Yury Polyanskiy

Abstract: Inference capabilities of machine learning (ML) systems skyrocketed in recent years, now playing a pivotal role in various aspect of society. The goal in statistical learning is to use data to obtain simple algorithms for predicting a random variable $Y$ from a correlated observation $X$. Since the dimension of $X$ is typically huge, computationally feasible solutions should summarize it into a lo… ▽ More Inference capabilities of machine learning (ML) systems skyrocketed in recent years, now playing a pivotal role in various aspect of society. The goal in statistical learning is to use data to obtain simple algorithms for predicting a random variable $Y$ from a correlated observation $X$. Since the dimension of $X$ is typically huge, computationally feasible solutions should summarize it into a lower-dimensional feature vector $T$, from which $Y$ is predicted. The algorithm will successfully make the prediction if $T$ is a good proxy of $Y$, despite the said dimensionality-reduction. A myriad of ML algorithms (mostly employing deep learning (DL)) for finding such representations $T$ based on real-world data are now available. While these methods are often effective in practice, their success is hindered by the lack of a comprehensive theory to explain it. The information bottleneck (IB) theory recently emerged as a bold information-theoretic paradigm for analyzing DL systems. Adopting mutual information as the figure of merit, it suggests that the best representation $T$ should be maximally informative about $Y$ while minimizing the mutual information with $X$. In this tutorial we survey the information-theoretic origins of this abstract principle, and its recent impact on DL. For the latter, we cover implications of the IB problem on DL theory, as well as practical algorithms inspired by it. Our goal is to provide a unified and cohesive description. A clear view of current knowledge is particularly important for further leveraging IB and other information-theoretic ideas to study DL models. △ Less

Submitted 1 May, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

arXiv:2004.04330 [pdf, other]

The Secrecy Capacity of Cost-Constrained Wiretap Channels

Authors: Sreejith Sreekumar, Alexander Bunin, Ziv Goldfeld, Haim H. Permuter, Shlomo Shamai

Abstract: In many information-theoretic channel coding problems, adding an input cost constraint to the operational setup amounts to restricting the optimization domain in the capacity formula. This paper shows that, in contrast to common belief, such a simple modification does not hold for the cost-constrained (CC) wiretap channel (WTC). The secrecy-capacity of the discrete memoryless (DM) WTC without cost… ▽ More In many information-theoretic channel coding problems, adding an input cost constraint to the operational setup amounts to restricting the optimization domain in the capacity formula. This paper shows that, in contrast to common belief, such a simple modification does not hold for the cost-constrained (CC) wiretap channel (WTC). The secrecy-capacity of the discrete memoryless (DM) WTC without cost constraints is described by a single auxiliary random variable. For the CC DM-WTC, however, we show that two auxiliaries are necessary to achieve capacity. Specifically, we first derive the secrecy-capacity formula, proving the direct part via superposition coding. Then, we provide an example of a CC DM-WTC whose secrecy-capacity cannot be achieved using a single auxiliary. This establishes the fundamental role of superposition coding over CC WTCs. △ Less

Submitted 26 December, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

arXiv:2003.04179 [pdf, ps, other]

Capacity of Continuous Channels with Memory via Directed Information Neural Estimator

Authors: Ziv Aharoni, Dor Tsur, Ziv Goldfeld, Haim Henry Permuter

Abstract: Calculating the capacity (with or without feedback) of channels with memory and continuous alphabets is a challenging task. It requires optimizing the directed information (DI) rate over all channel input distributions. The objective is a multi-letter expression, whose analytic solution is only known for a few specific cases. When no analytic solution is present or the channel model is unknown, th… ▽ More Calculating the capacity (with or without feedback) of channels with memory and continuous alphabets is a challenging task. It requires optimizing the directed information (DI) rate over all channel input distributions. The objective is a multi-letter expression, whose analytic solution is only known for a few specific cases. When no analytic solution is present or the channel model is unknown, there is no unified framework for calculating or even approximating capacity. This work proposes a novel capacity estimation algorithm that treats the channel as a `black-box', both when feedback is or is not present. The algorithm has two main ingredients: (i) a neural distribution transformer (NDT) model that shapes a noise variable into the channel input distribution, which we are able to sample, and (ii) the DI neural estimator (DINE) that estimates the communication rate of the current NDT model. These models are trained by an alternating maximization procedure to both estimate the channel capacity and obtain an NDT for the optimal input distribution. The method is demonstrated on the moving average additive Gaussian noise channel, where it is shown that both the capacity and feedback capacity are estimated without knowledge of the channel transition kernel. The proposed estimation framework opens the door to a myriad of capacity approximation results for continuous alphabet channels that were inaccessible until now. △ Less

Submitted 16 May, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

arXiv:1905.13576 [pdf, other]

Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

Authors: Ziv Goldfeld, Kristjan Greenewald, Yury Polyanskiy, Jonathan Weed

Abstract: This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating $P\ast\mathcal{N}_σ$, for $\mathcal{N}_σ\triangleq\mathcal{N}(0,σ^2 \mathrm{I}_d)$, by $\hat{P}_n\ast\mathcal{N}_σ$, where $\hat{P}_n$ is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variati… ▽ More This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating $P\ast\mathcal{N}_σ$, for $\mathcal{N}_σ\triangleq\mathcal{N}(0,σ^2 \mathrm{I}_d)$, by $\hat{P}_n\ast\mathcal{N}_σ$, where $\hat{P}_n$ is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and $χ^2$-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance ($\mathsf{W}_1$) converges at rate $e^{O(d)}n^{-\frac{1}{2}}$ in remarkable contrast to a typical $n^{-\frac{1}{d}}$ rate for unsmoothed $\mathsf{W}_1$ (and $d\ge 3$). For the KL divergence, squared 2-Wasserstein distance ($\mathsf{W}_2^2$), and $χ^2$-divergence, the convergence rate is $e^{O(d)}n^{-1}$, but only if $P$ achieves finite input-output $χ^2$ mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to $ω(n^{-1})$ for the KL divergence and $\mathsf{W}_2^2$, while the $χ^2$-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy $h(P\ast\mathcal{N}_σ)$ in the high-dimensional regime. The distribution $P$ is unknown but $n$ i.i.d samples from it are available. We first show that any good estimator of $h(P\ast\mathcal{N}_σ)$ must have sample complexity that is exponential in $d$. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate $e^{O(d)}n^{-\frac{1}{2}}$, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided. △ Less

Submitted 1 May, 2020; v1 submitted 30 May, 2019; originally announced May 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1810.11589

arXiv:1810.05728 [pdf, other]

Estimating Information Flow in Deep Neural Networks

Authors: Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy

Abstract: We study the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information $I(X;T)$ between the input $X$ and internal representat… ▽ More We study the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information $I(X;T)$ between the input $X$ and internal representations $T$ decreases. Several papers observe compression of estimated mutual information on different DNN models, but the true $I(X;T)$ over these networks is provably either constant (discrete $X$) or infinite (continuous $X$). This work explains the discrepancy between theory and experiments, and clarifies what was actually measured by these past works. To this end, we introduce an auxiliary (noisy) DNN framework for which $I(X;T)$ is a meaningful quantity that depends on the network's parameters. This noisy framework is shown to be a good proxy for the original (deterministic) DNN both in terms of performance and the learned representations. We then develop a rigorous estimator for $I(X;T)$ in noisy DNNs and observe compression in various models. By relating $I(X;T)$ in the noisy DNN to an information-theoretic communication problem, we show that compression is driven by the progressive clustering of hidden representations of inputs from the same class. Several methods to directly monitor clustering of hidden representations, both in noisy and deterministic DNNs, are used to show that meaningful clusters form in the $T$ space. Finally, we return to the estimator of $I(X;T)$ employed in past works, and demonstrate that while it fails to capture the true (vacuous) mutual information, it does serve as a measure for clustering. This clarifies the past observations of compression and isolates the geometric clustering of hidden representations as the true phenomenon of interest. △ Less

Submitted 30 May, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

Comments: Main text accepted to ICML 2019. This preprint contains the full version of that paper (including omitted appendices)

arXiv:1805.03027 [pdf, ps, other]

Information Storage in the Stochastic Ising Model

Authors: Ziv Goldfeld, Guy Bresler, Yury Polyanskiy

Abstract: Most information storage devices write data by modifying the local state of matter, in the hope that sub-atomic local interactions stabilize the state for sufficiently long time, thereby allowing later recovery. Motivated to explore how temporal evolution of physical states in magnetic storage media affects their capacity, this work initiates the study of information retention in locally-interacti… ▽ More Most information storage devices write data by modifying the local state of matter, in the hope that sub-atomic local interactions stabilize the state for sufficiently long time, thereby allowing later recovery. Motivated to explore how temporal evolution of physical states in magnetic storage media affects their capacity, this work initiates the study of information retention in locally-interacting particle systems. The system dynamics follow the stochastic Ising model (SIM) over a 2-dimensional $\sqrt{n}\times\sqrt{n}$ grid. The initial spin configuration $X_0$ serves as the user-controlled input. The output configuration $X_t$ is produced by running $t$ steps of Glauber dynamics. Our main goal is to evaluate the information capacity $I_n(t):=\max_{p_{X_0}}I(X_0;X_t)$ when time $t$ scales with the system's size $n$. While the positive (but low) temperature regime is our main interest, we start by exploring the simpler zero-temperature dynamics. We first show that at zero temperature, order of $\sqrt{n}$ bits can be stored in the system indefinitely by coding over stable, striped configurations. While $\sqrt{n}$ is order optimal for infinite time, backing off to $t<\infty$, higher orders of $I_n(t)$ are achievable. First, linear coding arguments imply that $I_n(t) = Θ(n)$ for $t=O(n)$. To go beyond the linear scale, we develop a droplet-based achievability scheme that reliably stores $Ω\left(n/\log n\right)$ for $t=O(n\log n)$ time ($\log n$ can be replaced with any $o(n)$ function). Moving to the positive but low temperature regime, two main results are provided. First, we show that an initial configuration drawn from the Gibbs measure cannot retain more than a single bit for $t\geq \exp(Cβn^{1/4+ε})$ time. On the other hand, when scaling time with the inverse temperature $β$, the stripe-based coding scheme is shown to retain its bits for $e^{cβ}$. △ Less

Submitted 23 December, 2020; v1 submitted 8 May, 2018; originally announced May 2018.

arXiv:1712.10299 [pdf, ps, other]

Wiretap and Gelfand-Pinsker Channels Analogy and its Applications

Authors: Ziv Goldfeld, Haim. H. Permuter

Abstract: An analogy framework between wiretap channels (WTCs) and state-dependent point-to-point channels with non-causal encoder channel state information (referred to as Gelfand-Pinker channels (GPCs)) is proposed. A good sequence of stealth-wiretap codes is shown to induce a good sequence of codes for a corresponding GPC. Consequently, the framework enables exploiting existing results for GPCs to produc… ▽ More An analogy framework between wiretap channels (WTCs) and state-dependent point-to-point channels with non-causal encoder channel state information (referred to as Gelfand-Pinker channels (GPCs)) is proposed. A good sequence of stealth-wiretap codes is shown to induce a good sequence of codes for a corresponding GPC. Consequently, the framework enables exploiting existing results for GPCs to produce converse proofs for their wiretap analogs. The analogy readily extends to multiuser broadcasting scenarios, encompassing broadcast channels (BCs) with deterministic components, degradation ordering between users, and BCs with cooperative receivers. Given a wiretap BC (WTBC) with two receivers and one eavesdropper, an analogous Gelfand-Pinsker BC (GPBC) is constructed by converting the eavesdropper's observation sequence into a state sequence with an appropriate product distribution (induced by the stealth-wiretap code for the WTBC), and non-causally revealing the states to the encoder. The transition matrix of the state-dependent GPBC is extracted from WTBC's transition law, with the eavesdropper's output playing the role of the channel state. Past capacity results for the semi-deterministic (SD) GPBC and the physically-degraded (PD) GPBC with an informed receiver are leveraged to furnish analogy-based converse proofs for the analogous WTBC setups. This characterizes the secrecy-capacity regions of the SD-WTBC and the PD-WTBC, in which the stronger receiver also observes the eavesdropper's channel output. These derivations exemplify how the wiretap-GP analogy enables translating results on one problem into advances in the study of the other. △ Less

Submitted 28 May, 2019; v1 submitted 29 December, 2017; originally announced December 2017.

arXiv:1708.04283 [pdf, ps, other]

Key and Message Semantic-Security over State-Dependent Channels

Authors: Alexander Bunin, Ziv Goldfeld, Haim H. Permuter, Shlomo Shamai, Paul Cuff, Pablo Piantanida

Abstract: We study the trade-off between secret message (SM) and secret key (SK) rates, simultaneously achievable over a state-dependent (SD) wiretap channel (WTC) with non-causal channel state information (CSI) at the encoder. This model subsumes other instances of CSI availability as special cases, and calls for efficient utilization of the state sequence for both reliability and security purposes. An inn… ▽ More We study the trade-off between secret message (SM) and secret key (SK) rates, simultaneously achievable over a state-dependent (SD) wiretap channel (WTC) with non-causal channel state information (CSI) at the encoder. This model subsumes other instances of CSI availability as special cases, and calls for efficient utilization of the state sequence for both reliability and security purposes. An inner bound on the semantic-security (SS) SM-SK capacity region is derived based on a superposition coding scheme inspired by a past work of the authors. The region is shown to attain capacity for a certain class of SD-WTCs. SS is established by virtue of two versions of the strong soft-covering lemma. The derived region yields an improvement upon the previously best known SM-SK trade-off result reported by Prabhakaran et al., and, to the best of our knowledge, upon all other existing lower bounds for either SM or SK for this setup, even if the semantic security requirement is relaxed to weak secrecy. It is demonstrated that our region can be strictly larger than those reported in the preceding works. △ Less

Submitted 7 June, 2019; v1 submitted 14 August, 2017; originally announced August 2017.

arXiv:1610.03990 [pdf, ps, other]

Fourier-Motzkin Elimination Software for Information Theoretic Inequalities

Authors: Ido B. Gattegno, Ziv Goldfeld, Haim H. Permuter

Abstract: We provide open-source software implemented in MATLAB, that performs Fourier-Motzkin elimination (FME) and removes constraints that are redundant due to Shannon-type inequalities (STIs). The FME is often used in information theoretic contexts to simplify rate regions, e.g., by eliminating auxiliary rates. Occasionally, however, the procedure becomes cumbersome, which makes an error-free hand-writt… ▽ More We provide open-source software implemented in MATLAB, that performs Fourier-Motzkin elimination (FME) and removes constraints that are redundant due to Shannon-type inequalities (STIs). The FME is often used in information theoretic contexts to simplify rate regions, e.g., by eliminating auxiliary rates. Occasionally, however, the procedure becomes cumbersome, which makes an error-free hand-written derivation an elusive task. Some computer software have circumvented this difficulty by exploiting an automated FME process. However, the outputs of such software often include constraints that are inactive due to information theoretic properties. By incorporating the notion of STIs (a class of information inequalities provable via a computer program), our algorithm removes such redundant constraints based on non-negativity properties, chain-rules and probability mass function factorization. This newsletter first illustrates the program's abilities, and then reviews the contribution of STIs to the identification of redundant constraints. △ Less

Submitted 13 October, 2016; originally announced October 2016.

arXiv:1608.06057 [pdf, ps, other]

MIMO Gaussian Broadcast Channels with Common, Private and Confidential Messages

Authors: Ziv Goldfeld, Haim H. Permuter

Abstract: The two-user multiple-input multiple-output (MIMO) Gaussian broadcast channel (BC) with common, private and confidential messages is considered. The transmitter sends a common message to both users, a confidential message to User 1 and a private (non-confidential) message to User 2. The secrecy-capacity region is characterized by showing that certain inner and outer bounds coincide and that the bo… ▽ More The two-user multiple-input multiple-output (MIMO) Gaussian broadcast channel (BC) with common, private and confidential messages is considered. The transmitter sends a common message to both users, a confidential message to User 1 and a private (non-confidential) message to User 2. The secrecy-capacity region is characterized by showing that certain inner and outer bounds coincide and that the boundary points are achieved by Gaussian inputs, which enables the development of a tight converse. The proof relies on factorization of upper concave envelopes and a variant of dirty-paper coding (DPC). It is shown that the entire region is exhausted by using DPC to cancel out the signal of the non-confidential message at Receiver 1, thus making DPC against the signal of the confidential message unnecessary. A numerical example illustrates the secrecy-capacity results. △ Less

Submitted 28 May, 2019; v1 submitted 22 August, 2016; originally announced August 2016.

arXiv:1608.00743 [pdf, ps, other]

Wiretap Channels with Random States Non-Causally Available at the Encoder

Authors: Ziv Goldfeld, Paul Cuff, Haim H. Permuter

Abstract: We study the state-dependent (SD) wiretap channel (WTC) with non-causal channel state information (CSI) at the encoder. This model subsumes all other instances of CSI availability as special cases, and calls for an efficient utilization of the state sequence for both reliability and security purposes. A lower bound on the secrecy-capacity, that improves upon the previously best known result publis… ▽ More We study the state-dependent (SD) wiretap channel (WTC) with non-causal channel state information (CSI) at the encoder. This model subsumes all other instances of CSI availability as special cases, and calls for an efficient utilization of the state sequence for both reliability and security purposes. A lower bound on the secrecy-capacity, that improves upon the previously best known result published by Prabhakaran et al., is derived based on a novel superposition coding scheme. Our achievability gives rise to the exact secrecy-capacity characterization of a class of SD-WTCs that decompose into a product of two WTCs, where one is independent of the state and the other one depends only on the state. The results are derived under the strict semantic-security metric that requires negligible information leakage for all message distributions. △ Less

Submitted 28 May, 2019; v1 submitted 2 August, 2016; originally announced August 2016.

arXiv:1601.03660 [pdf, ps, other]

Arbitrarily Varying Wiretap Channels with Type Constrained States

Authors: Ziv Goldfeld, Paul Cuff, Haim H. Permuter

Abstract: An arbitrarily varying wiretap channel (AVWTC) with a type constraint on the allowed state sequences is considered, and a single-letter characterization of its correlated-random (CR) assisted semantic-security (SS) capacity is derived. The allowed state sequences are the ones in a typical set around a single constraining type. SS is established by showing that the mutual information between the me… ▽ More An arbitrarily varying wiretap channel (AVWTC) with a type constraint on the allowed state sequences is considered, and a single-letter characterization of its correlated-random (CR) assisted semantic-security (SS) capacity is derived. The allowed state sequences are the ones in a typical set around a single constraining type. SS is established by showing that the mutual information between the message and the eavesdropper's observations is negligible even when maximized over all message distributions, choices of state sequences and realizations of the CR-code. Both the achievability and the converse proofs of the type constrained coding theorem rely on stronger claims than actually required. The direct part establishes a novel single-letter lower bound on the CR-assisted SS-capacity of an AVWTC with state sequences constrained by any convex and closed set of state probability mass functions. This bound achieves the best known single-letter secrecy rates for a corresponding compound wiretap channel over the same constraint set. In contrast to other single-letter results in the AVWTC literature, this work does not assume the existence of a best channel to the eavesdropper. Instead, SS follows by leveraging the heterogeneous version of the stronger soft-covering lemma and a CR-code reduction argument. Optimality is a consequence of an max-inf upper bound on the CR-assisted SS-capacity of an AVWTC with state sequences constrained to any collection of type-classes. When adjusted to the aforementioned compound WTC, the upper bound simplifies to a max-min structure, thus strengthening the previously best known single-letter upper bound by Liang et al. that has a min-max form. The proof of the upper bound uses a novel distribution coupling argument. △ Less

Submitted 18 October, 2016; v1 submitted 14 January, 2016; originally announced January 2016.

arXiv:1601.01286 [pdf, ps, other]

Strong Secrecy for Cooperative Broadcast Channels

Authors: Ziv Goldfeld, Gerhard Kramer, Haim H. Permuter, Paul Cuff

Abstract: A broadcast channel (BC) where the decoders cooperate via a one-sided link is considered. One common and two private messages are transmitted and the private message to the cooperative user should be kept secret from the cooperation-aided user. The secrecy level is measured in terms of strong secrecy, i.e., a vanishing information leakage. An inner bound on the capacity region is derived by using… ▽ More A broadcast channel (BC) where the decoders cooperate via a one-sided link is considered. One common and two private messages are transmitted and the private message to the cooperative user should be kept secret from the cooperation-aided user. The secrecy level is measured in terms of strong secrecy, i.e., a vanishing information leakage. An inner bound on the capacity region is derived by using a channel-resolvability-based code that double-bins the codebook of the secret message, and by using a likelihood encoder to choose the transmitted codeword. The inner bound is shown to be tight for semi-deterministic and physically degraded BCs and the results are compared to those of the corresponding BCs without a secrecy constraint. Blackwell and Gaussian BC examples illustrate the impact of secrecy on the rate regions. Unlike the case without secrecy, where sharing information about both private messages via the cooperative link is optimal, our protocol conveys parts of the common and non-confidential messages only. This restriction reduces the transmission rates more than the usual rate loss due to secrecy requirements. An example that illustrates this loss is provided. △ Less

Submitted 28 May, 2019; v1 submitted 6 January, 2016; originally announced January 2016.

arXiv:1509.03619 [pdf, ps, other]

Semantic-Security Capacity for Wiretap Channels of Type II

Authors: Ziv Goldfeld, Paul Cuff, Haim H. Permuter

Abstract: The secrecy capacity of the type II wiretap channel (WTC II) with a noisy main channel is currently an open problem. Herein its secrecy-capacity is derived and shown to be equal to its semantic-security (SS) capacity. In this setting, the legitimate users communicate via a discrete-memoryless (DM) channel in the presence of an eavesdropper that has perfect access to a subset of its choosing of the… ▽ More The secrecy capacity of the type II wiretap channel (WTC II) with a noisy main channel is currently an open problem. Herein its secrecy-capacity is derived and shown to be equal to its semantic-security (SS) capacity. In this setting, the legitimate users communicate via a discrete-memoryless (DM) channel in the presence of an eavesdropper that has perfect access to a subset of its choosing of the transmitted symbols, constrained to a fixed fraction of the blocklength. The secrecy criterion is achieved simultaneously for all possible eavesdropper subset choices. The SS criterion demands negligible mutual information between the message and the eavesdropper's observations even when maximized over all message distributions. A key tool for the achievability proof is a novel and stronger version of Wyner's soft covering lemma. Specifically, a random codebook is shown to achieve the soft-covering phenomenon with high probability. The probability of failure is doubly-exponentially small in the blocklength. Since the combined number of messages and subsets grows only exponentially with the blocklength, SS for the WTC II is established by using the union bound and invoking the stronger soft-covering lemma. The direct proof shows that rates up to the weak-secrecy capacity of the classic WTC with a DM erasure channel (EC) to the eavesdropper are achievable. The converse follows by establishing the capacity of this DM wiretap EC as an upper bound for the WTC II. From a broader perspective, the stronger soft-covering lemma constitutes a tool for showing the existence of codebooks that satisfy exponentially many constraints, a beneficial ability for many other applications in information theoretic security. △ Less

Submitted 17 August, 2016; v1 submitted 11 September, 2015; originally announced September 2015.

Journal ref: IEEE Transactions in Information Theory, Vol. 62, No. 7, July 2016

arXiv:1504.06136 [pdf, ps, other]

doi 10.1109/TIT.2017.2708086

Broadcast Channels with Privacy Leakage Constraints

Authors: Ziv Goldfeld, Gerhard Kramer, Haim H. Permuter

Abstract: The broadcast channel (BC) with one common and two private messages with leakage constraints is studied, where leakage rate refers to the normalized mutual information between a message and a channel symbol string. Each private message is destined for a different user and the leakage rate to the other receiver must satisfy a constraint. This model captures several scenarios concerning secrecy, i.e… ▽ More The broadcast channel (BC) with one common and two private messages with leakage constraints is studied, where leakage rate refers to the normalized mutual information between a message and a channel symbol string. Each private message is destined for a different user and the leakage rate to the other receiver must satisfy a constraint. This model captures several scenarios concerning secrecy, i.e., when both, either or neither of the private messages are secret. Inner and outer bounds on the leakage-capacity region are derived when the eavesdropper knows the codebook. The inner bound relies on a Marton-like code construction and the likelihood encoder. A Uniform Approximation Lemma is established that states that the marginal distribution induced by the encoder on each of the bins in the Marton codebook is approximately uniform. Without leakage constraints the inner bound recovers Marton's region and the outer bound reduces to the UVW-outer bound. The bounds match for semi-deterministic (SD) and physically degraded (PD) BCs, as well as for BCs with a degraded message set. The leakage-capacity regions of the SD-BC and the BC with a degraded message set recover past results for different secrecy scenarios. A Blackwell BC example illustrates the results and shows how its leakage-capacity region changes from the capacity region without secrecy to the secrecy-capacity regions for different secrecy scenarios. △ Less

Submitted 28 May, 2017; v1 submitted 23 April, 2015; originally announced April 2015.

arXiv:1405.7812 [pdf, ps, other]

doi 10.1109/TIT.2016.2533479

Duality of a Source Coding Problem and the Semi-Deterministic Broadcast Channel with Rate-Limited Cooperation

Authors: Ziv Goldfeld, Haim H. Permuter, Gerhard Kramer

Abstract: The Wyner-Ahlswede-Körner (WAK) empirical-coordination problem where the encoders cooperate via a finite-capacity one-sided link is considered. The coordination-capacity region is derived by combining several source coding techniques, such as Wyner-Ziv (WZ) coding, binning and superposition coding. Furthermore, a semi-deterministic (SD) broadcast channel (BC) with one-sided decoder cooperation is… ▽ More The Wyner-Ahlswede-Körner (WAK) empirical-coordination problem where the encoders cooperate via a finite-capacity one-sided link is considered. The coordination-capacity region is derived by combining several source coding techniques, such as Wyner-Ziv (WZ) coding, binning and superposition coding. Furthermore, a semi-deterministic (SD) broadcast channel (BC) with one-sided decoder cooperation is considered. Duality principles relating the two problems are presented, and the capacity region for the SD-BC setting is derived. The direct part follows from an achievable region for a general BC that is tight for the SD scenario. A converse is established by using telescoping identities. The SD-BC is shown to be operationally equivalent to a class of relay-BCs (RBCs) and the correspondence between their capacity regions is established. The capacity region of the SD-BC is transformed into an equivalent region that is shown to be dual to the admissible region of the WAK problem in the sense that the information measures defining the corner points of both regions coincide. Achievability and converse proofs for the equivalent region are provided. For the converse, we use a probabilistic construction of auxiliary random variables that depends on the distribution induced by the codebook. Several examples illustrate the results. △ Less

Submitted 17 August, 2016; v1 submitted 30 May, 2014; originally announced May 2014.

Journal ref: IEEE Transactions on Information Theory, Vol. 62, No. 5, May 2016

arXiv:1303.7083 [pdf, ps, other]

doi 10.1109/TIT.2014.2346494

The Finite State MAC with Cooperative Encoders and Delayed CSI

Authors: Ziv Goldfeld, Haim H. Permuter, Benjamin M. Zaidel

Abstract: In this paper, we consider the finite-state multiple access channel (MAC) with partially cooperative encoders and delayed channel state information (CSI). Here partial cooperation refers to the communication between the encoders via finite-capacity links. The channel states are assumed to be governed by a Markov process. Full CSI is assumed at the receiver, while at the transmitters, only delayed… ▽ More In this paper, we consider the finite-state multiple access channel (MAC) with partially cooperative encoders and delayed channel state information (CSI). Here partial cooperation refers to the communication between the encoders via finite-capacity links. The channel states are assumed to be governed by a Markov process. Full CSI is assumed at the receiver, while at the transmitters, only delayed CSI is available. The capacity region of this channel model is derived by first solving the case of the finite-state MAC with a common message. Achievability for the latter case is established using the notion of strategies, however, we show that optimal codes can be constructed directly over the input alphabet. This results in a single codebook construction that is then leveraged to apply simultaneous joint decoding. Simultaneous decoding is crucial here because it circumvents the need to rely on the capacity region's corner points, a task that becomes increasingly cumbersome with the growth in the number of messages to be sent. The common message result is then used to derive the capacity region for the case with partially cooperating encoders. Next, we apply this general result to the special case of the Gaussian vector MAC with diagonal channel transfer matrices, which is suitable for modeling, e.g., orthogonal frequency division multiplexing (OFDM)-based communication systems. The capacity region of the Gaussian channel is presented in terms of a convex optimization problem that can be solved efficiently using numerical tools. The region is derived by first presenting an outer bound on the general capacity region and then suggesting a specific input distribution that achieves this bound. Finally, numerical results are provided that give valuable insight into the practical implications of optimally using conferencing to maximize the transmission rates. △ Less

Submitted 29 January, 2015; v1 submitted 28 March, 2013; originally announced March 2013.

Journal ref: IEEE Transactions on Information Theory, Vol. 60, No. 10, October 2014

Showing 1–33 of 33 results for author: Goldfeld, Z