Information Theory
See recent articles
Showing new listings for Thursday, 10 October 2024
- [1] arXiv:2410.05412 [pdf, html, other]
-
Title: Robust Matrix Completion with Deterministic Sampling via Convex OptimizationSubjects: Information Theory (cs.IT)
This paper deals with the problem of robust matrix completion -- retrieving a low-rank matrix and a sparse matrix from the compressed counterpart of their superposition. Though seemingly not an unresolved issue, we point out that the compressed matrix in our case is sampled in a deterministic pattern instead of those random ones on which existing studies depend. In fact, deterministic sampling is much more hardware-friendly than random ones. The limited resources on many platforms leave deterministic sampling the only choice to sense a matrix, resulting in the significance of investigating robust matrix completion with deterministic pattern. In such spirit, this paper proposes \textit{restricted approximate $\infty$-isometry property} and proves that, if a \textit{low-rank} and \textit{incoherent} square matrix and certain deterministic sampling pattern satisfy such property and two existing conditions called \textit{isomerism} and \textit{relative well-conditionedness}, the exact recovery from its sampled counterpart grossly corrupted by a small fraction of outliers via convex optimization happens with very high probability.
- [2] arXiv:2410.05501 [pdf, html, other]
-
Title: Timeliness in NextG Spectrum Sharing under Jamming Attacks with Deep LearningComments: 7 figures, original 5 pages, in proceedings on IEEE VTC-Fall 2024 ConferenceJournal-ref: In proceedings on IEEE VTC-Fall 2024 ConferenceSubjects: Information Theory (cs.IT)
We consider the communication of time-sensitive information in NextG spectrum sharing where a deep learning-based classifier is used to identify transmission attempts. While the transmitter seeks for opportunities to use the spectrum without causing interference to an incumbent user, an adversary uses another deep learning classifier to detect and jam the signals, subject to an average power budget. We consider timeliness objectives of NextG communications and study the Age of Information (AoI) under different scenarios of spectrum sharing and jamming, analyzing the effect of transmit control, transmit probability, and channel utilization subject to wireless channel and jamming effects. The resulting signal-to-noise-plus-interference (SINR) determines the success of spectrum sharing, but also affects the accuracy of the adversary's detection, making it more likely for the jammer to successfully identify and jam the communication. Our results illustrate the benefits of spectrum sharing for anti-jamming by exemplifying how a limited-power adversary is motivated to decrease its jamming power as the channel occupancy rises in NextG spectrum sharing with timeliness objectives.
- [3] arXiv:2410.05540 [pdf, html, other]
-
Title: Game of Coding: Sybil Resistant Decentralized Machine Learning with Minimal Trust AssumptionSubjects: Information Theory (cs.IT); Machine Learning (cs.LG)
Coding theory plays a crucial role in ensuring data integrity and reliability across various domains, from communication to computation and storage systems. However, its reliance on trust assumptions for data recovery poses significant challenges, particularly in emerging decentralized systems where trust is scarce. To address this, the game of coding framework was introduced, offering insights into strategies for data recovery within incentive-oriented environments. The focus of the earliest version of the game of coding was limited to scenarios involving only two nodes. This paper investigates the implications of increasing the number of nodes in the game of coding framework, particularly focusing on scenarios with one honest node and multiple adversarial nodes. We demonstrate that despite the increased flexibility for the adversary with an increasing number of adversarial nodes, having more power is not beneficial for the adversary and is not detrimental to the data collector, making this scheme sybil-resistant. Furthermore, we outline optimal strategies for the data collector in terms of accepting or rejecting the inputs, and characterize the optimal noise distribution for the adversary.
- [4] arXiv:2410.05587 [pdf, html, other]
-
Title: Deep Learning-Based Decoding of Linear Block Codes for Spin-Torque Transfer Magnetic Random Access Memory (STT-MRAM)Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Thanks to its superior features of fast read/write speed and low power consumption, spin-torque transfer magnetic random access memory (STT-MRAM) has become a promising non-volatile memory (NVM) technology that is suitable for many applications. However, the reliability of STT-MRAM is seriously affected by the variation of the memory fabrication process and the working temperature, and the later will lead to an unknown offset of the channel. Hence, there is a pressing need to develop more effective error correction coding techniques to tackle these imperfections and improve the reliability of STT-MRAM. In this work, we propose, for the first time, the application of deep-learning (DL) based algorithms and techniques to improve the decoding performance of linear block codes with short codeword lengths for STT-MRAM. We formulate the belief propagation (BP) decoding of linear block code as a neural network (NN), and propose a novel neural normalized-offset reliability-based min-sum (NNORB-MS) decoding algorithm. We successfully apply our proposed decoding algorithm to the STT-MRAM channel through channel symmetrization to overcome the channel asymmetry. We also propose an NN-based soft information generation method (SIGM) to take into account the unknown offset of the channel. Simulation results demonstrate that our proposed NNORB-MS decoding algorithm can achieve significant performance gain over both the hard-decision decoding (HDD) and the regular reliability-based min-sum (RB-MS) decoding algorithm, for cases without and with the unknown channel offset. Moreover, the decoder structure and time complexity of the NNORB-MS algorithm remain similar to those of the regular RB-MS algorithm.
- [5] arXiv:2410.05618 [pdf, html, other]
-
Title: Deep Transfer Learning-based Detection for Flash Memory ChannelsComments: This paper has been accepted for publication in IEEE Transactions on CommunicationsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
The NAND flash memory channel is corrupted by different types of noises, such as the data retention noise and the wear-out noise, which lead to unknown channel offset and make the flash memory channel non-stationary. In the literature, machine learning-based methods have been proposed for data detection for flash memory channels. However, these methods require a large number of training samples and labels to achieve a satisfactory performance, which is costly. Furthermore, with a large unknown channel offset, it may be impossible to obtain enough correct labels. In this paper, we reformulate the data detection for the flash memory channel as a transfer learning (TL) problem. We then propose a model-based deep TL (DTL) algorithm for flash memory channel detection. It can effectively reduce the training data size from $10^6$ samples to less than 104 samples. Moreover, we propose an unsupervised domain adaptation (UDA)-based DTL algorithm using moment alignment, which can detect data without any labels. Hence, it is suitable for scenarios where the decoding of error-correcting code fails and no labels can be obtained. Finally, a UDA-based threshold detector is proposed to eliminate the need for a neural network. Both the channel raw error rate analysis and simulation results demonstrate that the proposed DTL-based detection schemes can achieve near-optimal bit error rate (BER) performance with much less training data and/or without using any labels.
- [6] arXiv:2410.05644 [pdf, html, other]
-
Title: Sneak Path Interference-Aware Adaptive Detection and Decoding for Resistive Memory ArraysSubjects: Information Theory (cs.IT)
Resistive random-access memory (ReRAM) is an emerging non-volatile memory technology for high-density and high-speed data storage. However, the sneak path interference (SPI) occurred in the ReRAM crossbar array seriously affects its data recovery performance. In this letter, we first propose a quantized channel model of ReRAM, based on which we design both the one-bit and multi-bit channel quantizers by maximizing the mutual information of the channel. A key channel parameter that affects the quantizer design is the sneak path occurrence probability (SPOP) of the memory cell. We first use the average SPOP calculated statistically to design the quantizer, which leads to the same channel detector for different memory arrays. We then adopt the SPOP estimated separately for each memory array for the quantizer design, which is generated by an effective channel estimator and through an iterative detection and decoding scheme for the ReRAM channel. This results in an array-level SPI-aware adaptive detection and decoding approach. Moreover, since there is a strong correlation of the SPI that affects memory cells in the same rows/columns than that affecting cells in different rows/columns, we further derive a column-level scheme which outperforms the array-level scheme. We also propose a channel decomposition method that enables effective ways for theoretically analyzing the ReRAM channel. Simulation results show that the proposed SPI-aware adaptive detection and decoding schemes can approach the ideal performance with three quantization bits, with only one decoding iteration.
- [7] arXiv:2410.05961 [pdf, html, other]
-
Title: Active and Passive Beamforming Designs for SER Minimization in RIS-Assisted MIMO SystemsTrinh Van Chien, Bui Trong Duc, Ho Viet Duc Luong, Huynh Thi Thanh Binh, Hien Quoc Ngo, Symeon ChatzinotasComments: 16 pages, 13 figures, and 1 table . Accepted by TWCSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
This research exploits the applications of reconfigurable intelligent surface (RIS)-assisted multiple input multiple output (MIMO) systems, specifically addressing the enhancement of communication reliability with modulated signals. Specifically, we first derive the analytical downlink symbol error rate (SER) of each user as a multivariate function of both the phase-shift and beamforming vectors. The analytical SER enables us to obtain insights into the synergistic dynamics between the RIS and MIMO communication. We then introduce a novel average SER minimization problem subject to the practical constraints of the transmitted power budget and phase shift coefficients, which is NP-hard. By incorporating the differential evolution (DE) algorithm as a pivotal tool for optimizing the intricate active and passive beamforming variables in RIS-assisted communication systems, the non-convexity of the considered SER optimization problem can be effectively handled. Furthermore, an efficient local search is incorporated into the DE algorithm to overcome the local optimum, and hence offer low SER and high communication reliability. Monte Carlo simulations validate the analytical results and the proposed optimization framework, indicating that the joint active and passive beamforming design is superior to the other benchmarks.
- [8] arXiv:2410.06059 [pdf, html, other]
-
Title: Maximum Achievable Rate of Resistive Random-Access Memory Channels by Mutual Information Spectrum AnalysisSubjects: Information Theory (cs.IT)
The maximum achievable rate is derived for resistive random-access memory (ReRAM) channel with sneak path interference. Based on the mutual information spectrum analysis, the maximum achievable rate of ReRAM channel with independent and identically distributed (i.i.d.) binary inputs is derived as an explicit function of channel parameters such as the distribution of cell selector failures and channel noise level. Due to the randomness of cell selector failures, the ReRAM channel demonstrates multi-status characteristic. For each status, it is shown that as the array size is large, the fraction of cells affected by sneak paths approaches a constant value. Therefore, the mutual information spectrum of the ReRAM channel is formulated as a mixture of multiple stationary channels. Maximum achievable rates of the ReRAM channel with different settings, such as single- and across-array codings, with and without data shaping, and optimal and treating-interference-as-noise (TIN) decodings, are compared. These results provide valuable insights on the code design for ReRAM.
- [9] arXiv:2410.06115 [pdf, html, other]
-
Title: A physics-based perspective for understanding and utilizing spatial resources of wireless channelsHui Xu, Jun Wei Wu, Zhen Jie Qi, Hao Tian Wu, Rui Wen Shao, Qiang Cheng, Jieao Zhu, Linglong Dai, Tie Jun CuiComments: 31pages, 8 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
To satisfy the increasing demands for transmission rates of wireless communications, it is necessary to use spatial resources of electromagnetic (EM) waves. In this context, EM information theory (EIT) has become a hot topic by integrating the theoretical framework of deterministic mathematics and stochastic statistics to explore the transmission mechanisms of continuous EM waves. However, the previous studies were primarily focused on frame analysis, with limited exploration of practical applications and a comprehensive understanding of its essential physical characteristics. In this paper, we present a three-dimensional (3-D) line-of-sight channel capacity formula that captures the vector EM physics and accommodates both near- and far-field scenes. Based on the rigorous mathematical equation and the physical mechanism of fast multipole expansion, a channel model is established, and the finite angular spectral bandwidth feature of scattered waves is revealed. To adapt to the feature of the channel, an optimization problem is formulated for determining the mode currents on the transmitter, aiming to obtain the optimal design of the precoder and combiner. We make comprehensive analyses to investigate the relationship among the spatial degree of freedom, noise, and transmitted power, thereby establishing a rigorous upper bound of channel capacity. A series of simulations are conducted to validate the theoretical model and numerical method. This work offers a novel perspective and methodology for understanding and leveraging EIT, and provides a theoretical foundation for the design and optimization of future wireless communications.
- [10] arXiv:2410.06224 [pdf, html, other]
-
Title: The Fast M\"obius Transform: An algebraic approach to information decompositionComments: 15 pages, 6 figures, 2 tablesSubjects: Information Theory (cs.IT)
The partial information decomposition (PID) and its extension integrated information decomposition ($\Phi$ID) are promising frameworks to investigate information phenomena involving multiple variables. An important limitation of these approaches is the high computational cost involved in their calculation. Here we leverage fundamental algebraic properties of these decompositions to enable a computationally-efficient method to estimate them, which we call the fast Möbius transform. Our approach is based on a novel formula for estimating the Möbius function that circumvents important computational bottlenecks. We showcase the capabilities of this approach by presenting two analyses that would be unfeasible without this method: decomposing the information that neural activity at different frequency bands yield about the brain's macroscopic functional organisation, and identifying distinctive dynamical properties of the interactions between multiple voices in baroque music. Overall, our proposed approach illuminates the value of algebraic facets of information decomposition and opens the way to a wide range of future analyses.
- [11] arXiv:2410.06365 [pdf, html, other]
-
Title: Network-level ISAC: Performance Analysis and Optimal Antenna-to-BS AllocationComments: 13 pages, 12 figures, submitted to IEEE for possible publicationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
A cooperative architecture is proposed for integrated sensing and communication (ISAC) networks, incorporating coordinated multi-point (CoMP) transmission along with multi-static sensing. We investigate how the allocation of antennas-to-base stations (BSs) affects cooperative sensing and cooperative communication performance. More explicitly, we balance the benefits of geographically concentrated antennas, which enhance beamforming and coherent processing, against those of geographically distributed antennas, which improve diversity and reduce service distances. Regarding sensing performance, we investigate three localization methods: angle-of-arrival (AOA)-based, time-of-flight (TOF)-based, and a hybrid approach combining both AOA and TOF measurements, for critically appraising their effects on ISAC network performance. Our analysis shows that in networks having N ISAC nodes following a Poisson point process, the localization accuracy of TOF-based methods follow a \ln^2 N scaling law (explicitly, the Cramér-Rao lower bound (CRLB) reduces with \ln^2 N). The AOA-based methods follow a \ln N scaling law, while the hybrid methods scale as a\ln^2 N + b\ln N, where a and b represent parameters related to TOF and AOA measurements, respectively. The difference between these scaling laws arises from the distinct ways in which measurement results are converted into the target location. In terms of communication performance, we derive a tractable expression for the communication data rate, considering various cooperative region sizes and antenna-to-BS allocation strategy. It is proved that higher path loss exponents favor distributed antenna allocation to reduce access distances, while lower exponents favor centralized antenna allocation to maximize beamforming gain.
- [12] arXiv:2410.06504 [pdf, html, other]
-
Title: Transformer-assisted Parametric CSI Feedback for mmWave Massive MIMO SystemsComments: 14 pages, 13 figures, accepted to IEEE Transactions on Wireless CommunicationsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
As a key technology to meet the ever-increasing data rate demand in beyond 5G and 6G communications, millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems have gained much attention this http URL make the most of mmWave massive MIMO systems, acquisition of accurate channel state information (CSI) at the base station (BS) is crucial. However, this task is by no means easy due to the CSI feedback overhead induced by the large number of antennas. In this paper, we propose a parametric CSI feedback technique for mmWave massive MIMO systems. Key idea of the proposed technique is to compress the mmWave MIMO channel matrix into a few geometric channel parameters (e.g., angles, delays, and path gains). Due to the limited scattering of mmWave signal, the number of channel parameters is much smaller than the number of antennas, thereby reducing the CSI feedback overhead significantly. Moreover, by exploiting the deep learning (DL) technique for the channel parameter extraction and the MIMO channel reconstruction, we can effectively suppress the channel quantization error. From the numerical results, we demonstrate that the proposed technique outperforms the conventional CSI feedback techniques in terms of normalized mean square error (NMSE) and bit error rate (BER).
- [13] arXiv:2410.06506 [pdf, html, other]
-
Title: Cooperative Multi-Target Positioning for Cell-Free Massive MIMO with Multi-Agent Reinforcement LearningSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Cell-free massive multiple-input multiple-output (mMIMO) is a promising technology to empower next-generation mobile communication networks. In this paper, to address the computational complexity associated with conventional fingerprint positioning, we consider a novel cooperative positioning architecture that involves certain relevant access points (APs) to establish positioning similarity coefficients. Then, we propose an innovative joint positioning and correction framework employing multi-agent reinforcement learning (MARL) to tackle the challenges of high-dimensional sophisticated signal processing, which mainly leverages on the received signal strength information for preliminary positioning, supplemented by the angle of arrival information to refine the initial position estimation. Moreover, to mitigate the bias effects originating from remote APs, we design a cooperative weighted K-nearest neighbor (Co-WKNN)-based estimation scheme to select APs with a high correlation to participate in user positioning. In the numerical results, we present comparisons of various user positioning schemes, which reveal that the proposed MARL-based positioning scheme with Co-WKNN can effectively improve positioning performance. It is important to note that the cooperative positioning architecture is a critical element in striking a balance between positioning performance and computational complexity.
- [14] arXiv:2410.06512 [pdf, html, other]
-
Title: In-Band Full-Duplex MIMO Systems for Simultaneous Communications and Sensing: Challenges, Methods, and Future PerspectivesComments: 12 pages, 5 figures, White Paper to appear at IEEE SPMSubjects: Information Theory (cs.IT); Emerging Technologies (cs.ET); Signal Processing (eess.SP)
In-band Full-Duplex (FD) Multiple-Input Multiple-Output (MIMO) systems offer a significant opportunity for Integrated Sensing and Communications (ISAC) due to their capability to realize simultaneous signal transmissions and receptions. This feature has been recently exploited to devise spectrum-efficient simultaneous information transmission and monostatic sensing operations, a line of research typically referred to as MIMO FD-ISAC. In this article, capitalizing on a recent FD MIMO architecture with reduced complexity analog cancellation, we present an FD-enabled framework for simultaneous communications and sensing using data signals. In contrast to communications applications, the framework's goal is not to mitigate self interference, since it includes reflections of the downlink data transmissions from targets in the FD node's vicinity, but to optimize the system parameters for the intended dual functionality. The unique characteristics and challenges of a generic MIMO FD-ISAC system are discussed along with a broad overview of state-of-the-art special cases, including numerical investigations. Several directions for future work on FD-enabled ISAC relevant to signal processing communities are also provided.
- [15] arXiv:2410.06589 [pdf, html, other]
-
Title: A Family of LZ78-based Universal Sequential Probability AssignmentsComments: 31 pages, 5 figures, submitted to IEEE Transactions on Information TheorySubjects: Information Theory (cs.IT)
We propose and study a family of universal sequential probability assignments on individual sequences, based on the incremental parsing procedure of the Lempel-Ziv (LZ78) compression algorithm. We show that the normalized log loss under any of these models converges to the normalized LZ78 codelength, uniformly over all individual sequences. To establish the universality of these models, we consolidate a set of results from the literature relating finite-state compressibility to optimal log-loss under Markovian and finite-state models. We also consider some theoretical and computational properties of these models when viewed as probabilistic sources. Finally, we present experimental results showcasing the potential benefit of using this family -- as models and as sources -- for compression, generation, and classification.
- [16] arXiv:2410.06647 [pdf, html, other]
-
Title: Achieving Interference-Free Degrees of Freedom in Cellular Networks via RISSubjects: Information Theory (cs.IT)
It's widely perceived that Reconfigurable Intelligent Surfaces (RIS) cannot increase Degrees of Freedom (DoF) due to their relay nature. A notable exception is Jiang \& Yu's work. They demonstrate via simulation that in an ideal $K$-user interference channel, passive RIS can achieve the interference-free DoF. In this paper, we investigate the DoF gain of RIS in more realistic systems, namely cellular networks, and more challenging scenarios with direct links. We prove that RIS can boost the DoF per cell to that of the interference-free scenario even \textit{ with direct-links}. Furthermore, we \textit{theoretically} quantify the number of RIS elements required to achieve that goal, i.e. $max\left\{ {2L, (\sqrt L + c)\eta+L } \right\}$ (where $L=GM(GM-1)$, $c$ is a constant and $\eta$ denotes the ratio of channel strength) for the $G$-cells with more single-antenna users $K$ than base station antennas $M$ per cell. The main challenge lies in addressing the feasibility of a system of algebraic equations, which is difficult by itself in algebraic geometry. We tackle this problem in a probabilistic way, by exploiting the randomness of the involved coefficients and addressing the problem from the perspective of extreme value statistics and convex geometry. Moreover, numerical results confirm the tightness of our theoretical results.
- [17] arXiv:2410.06767 [pdf, html, other]
-
Title: On the Performance of Pilot-Aided Simultaneous Communication and TrackingComments: 13 pages, 9 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In this paper, the symbol error rate performance analysis is provided for a pilot-aided simultaneous communication and tracking (PASCAT) system. In specific, we employ multiple drones to actively transmit signals to a BS, which is responsible for continuously monitoring the location of drones over time and decoding the symbols transmitted from the drones. It is found that the estimated location parameters at a given moment during tracking follow Gaussian distributions with means equal to actual values and variances equal to root mean square error (RMSE). Afterwards, the obtained location information is employed for informing the channel information, which is then used to preprocess the received signal before decoding by using the maximum ratio combining (MRC) technique. The average symbol error rate (SER) is also evaluated over the distribution of the estimated location parameters and an approximate value for the average SER is obtained by using a Taylor approximation with fast convergence. The result indicates that there is a cooperation relationship between the RMSE of the estimated location parameters and the average SER. In addition, the effect of the number of pilot signals is analysed as well. By employing more pilots, it is found that both communication and sensing functionalities are enhanced. Furthermore, the SER performance of our PASCAT system is similar to that of maximum likelihood detection (MLD) when a number of pilot signals are employed, which demonstrates the efficiency of the PASCAT system. In the end, all results are validated by using Monte Carlo simulations.
- [18] arXiv:2410.07051 [pdf, html, other]
-
Title: Exponents for Shared Randomness-Assisted Channel SimulationComments: 27+6 pagesSubjects: Information Theory (cs.IT); Quantum Physics (quant-ph)
We determine the exact error and strong converse exponents of shared randomness-assisted channel simulation in worst case total-variation distance. Namely, we find that these exponents can be written as simple optimizations over the Rényi channel mutual information. Strikingly, and in stark contrast to channel coding, there are no critical rates, allowing a tight characterization for arbitrary rates below and above the simulation capacity. We derive our results by asymptotically expanding the meta-converse for channel simulation [Cao {\it et al.}, IEEE Trans.~Inf.~Theory (2024)], which corresponds to non-signaling assisted codes. We prove this to be asymptotically tight by employing the approximation algorithms from [Berta {\it et al.}, Proc.~IEEE ISIT (2024)], which show how to round any non-signaling assisted strategy to a strategy that only uses shared randomness. Notably, this implies that any additional quantum entanglement-assistance does not change the error or the strong converse exponents.
- [19] arXiv:2410.07120 [pdf, other]
-
Title: Sequential Decoding of Multiple Traces Over the Syndrome Trellis for Synchronization ErrorsComments: Submitted to ICASSPSubjects: Information Theory (cs.IT)
Standard decoding approaches for convolutional codes, such as the Viterbi and BCJR algorithms, entail significant complexity when correcting synchronization errors. The situation worsens when multiple received sequences should be jointly decoded, as in DNA storage. Previous work has attempted to address this via separate-BCJR decoding, i.e., combining the results of decoding each received sequence separately. Another attempt to reduce complexity adapted sequential decoders for use over channels with insertion and deletion errors. However, these decoding alternatives remain prohibitively expensive for high-rate convolutional codes. To address this, we adapt sequential decoders to decode multiple received sequences jointly over the syndrome trellis. For the short blocklength regime, this decoding strategy can outperform separate-BCJR decoding under certain channel conditions, in addition to reducing decoding complexity. To mitigate the occurrence of a decoding timeout, formally called erasure, we also extend this approach to work bidirectionally, i.e., deploying two independent stack decoders that simultaneously operate in the forward and backward directions.
- [20] arXiv:2410.07146 [pdf, html, other]
-
Title: Estimation and Confidence Intervals for Mutual Information: Issues in Convergence for Non-Normal DistributionsComments: 8 figuresSubjects: Information Theory (cs.IT)
By employing various empirical estimators for the Mutual Information (MI) measure, we calculate and compare the estimates and their confidence intervals for both normal and non-normal bivariate data samples. We find that certain nonlinear invertible transformations of the random variables can significantly affect both the estimated MI value and the precision and asymptotic behavior of its confidence intervals. Generally, for non-normal samples, the confidence intervals are larger than those for normal samples, and the convergence of the confidence intervals is slower even as the data sample size increases. In some cases, due to strong biases, the estimated confidence interval may not contain the true value at all. We discuss various strategies to improve the precision of the estimated Mutual Information.
- [21] arXiv:2410.07159 [pdf, html, other]
-
Title: On the Spectral Efficiency of D-MIMO Networks under Rician FadingComments: 6 pages, 6 figures. Manuscript submitted to the IEEE Wireless Communications and Networking Conference (WCNC), Milan, Italy, 2025. arXiv admin note: text overlap with arXiv:2406.19078Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Contemporary wireless communications systems adopt the Multi-User Multiple-Input Multiple-Output (MU-MIMO) technique: a single base station or Access Point (AP) equipped with multiple antenna elements serves multiple active users simultaneously. Aiming at providing a more uniform wireless coverage, industry and academia have been working towards the evolution from centralized MIMO to Distributed-MIMO. That is, instead of having all the antenna elements co-located at a single AP, multiple APs, each equipped with a few or a single antenna element, jointly cooperate to serve the active users in the coverage area. In this work, we evaluate the performance of different D-MIMO setups under Rician fading, and considering different receive combining schemes. Note that the Rician fading model is convenient for MU-MIMO performance assessment, as it encompasses a wide variety of scenarios. Our numerical results show that the correlation among the channel vectors of different users increases with the Rician factor, which leads to a reduction on the achievable Spectral Efficiency (SE). Moreover, given a total number of antenna elements, there is an optimal number of APs and antenna elements per AP that provides the best performance. This "sweet spot" depends on the Rician factor and on the adopted receive combining scheme.
New submissions (showing 21 of 21 entries)
- [22] arXiv:2410.05493 (cross-list from cs.LG) [pdf, html, other]
-
Title: Transformers learn variable-order Markov chains in-contextSubjects: Machine Learning (cs.LG); Information Theory (cs.IT)
Large language models have demonstrated impressive in-context learning (ICL) capability. However, it is still unclear how the underlying transformers accomplish it, especially in more complex scenarios. Toward this goal, several recent works studied how transformers learn fixed-order Markov chains (FOMC) in context, yet natural languages are more suitably modeled by variable-order Markov chains (VOMC), i.e., context trees (CTs). In this work, we study the ICL of VOMC by viewing language modeling as a form of data compression and focus on small alphabets and low-order VOMCs. This perspective allows us to leverage mature compression algorithms, such as context-tree weighting (CTW) and prediction by partial matching (PPM) algorithms as baselines, the former of which is Bayesian optimal for a class of CTW priors. We empirically observe a few phenomena: 1) Transformers can indeed learn to compress VOMC in-context, while PPM suffers significantly; 2) The performance of transformers is not very sensitive to the number of layers, and even a two-layer transformer can learn in-context quite well; and 3) Transformers trained and tested on non-CTW priors can significantly outperform the CTW algorithm. To explain these phenomena, we analyze the attention map of the transformers and extract two mechanisms, on which we provide two transformer constructions: 1) A construction with $D+2$ layers that can mimic the CTW algorithm accurately for CTs of maximum order $D$, 2) A 2-layer transformer that utilizes the feed-forward network for probability blending. One distinction from the FOMC setting is that a counting mechanism appears to play an important role. We implement these synthetic transformer layers and show that such hybrid transformers can match the ICL performance of transformers, and more interestingly, some of them can perform even better despite the much-reduced parameter sets.
- [23] arXiv:2410.05646 (cross-list from cs.LG) [pdf, html, other]
-
Title: Score-Based Variational Inference for Inverse ProblemsComments: 10 pages, 7 figures, conferenceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Existing diffusion-based methods for inverse problems sample from the posterior using score functions and accept the generated random samples as solutions. In applications that posterior mean is preferred, we have to generate multiple samples from the posterior which is time-consuming. In this work, by analyzing the probability density evolution of the conditional reverse diffusion process, we prove that the posterior mean can be achieved by tracking the mean of each reverse diffusion step. Based on that, we establish a framework termed reverse mean propagation (RMP) that targets the posterior mean directly. We show that RMP can be implemented by solving a variational inference problem, which can be further decomposed as minimizing a reverse KL divergence at each reverse step. We further develop an algorithm that optimizes the reverse KL divergence with natural gradient descent using score functions and propagates the mean at each reverse step. Experiments demonstrate the validity of the theory of our framework and show that our algorithm outperforms state-of-the-art algorithms on reconstruction performance with lower computational complexity in various inverse problems.
- [24] arXiv:2410.05734 (cross-list from cs.LG) [pdf, html, other]
-
Title: Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed BanditsSubjects: Machine Learning (cs.LG); Information Theory (cs.IT)
The piecewise-stationary bandit problem is an important variant of the multi-armed bandit problem that further considers abrupt changes in the reward distributions. The main theme of the problem is the trade-off between exploration for detecting environment changes and exploitation of traditional bandit algorithms. While this problem has been extensively investigated, existing works either assume knowledge about the number of change points $M$ or require extremely high computational complexity. In this work, we revisit the piecewise-stationary bandit problem from a minimalist perspective. We propose a novel and generic exploration mechanism, called diminishing exploration, which eliminates the need for knowledge about $M$ and can be used in conjunction with an existing change detection-based algorithm to achieve near-optimal regret scaling. Simulation results show that despite oblivious of $M$, equipping existing algorithms with the proposed diminishing exploration generally achieves better empirical regret than the traditional uniform exploration.
- [25] arXiv:2410.05844 (cross-list from eess.SY) [pdf, html, other]
-
Title: Spectrally Efficient LDPC Codes For IRIG-106 Waveforms via Random PuncturingComments: Accepted for inclusion in the 2024 International Telemetry ConferenceSubjects: Systems and Control (eess.SY); Information Theory (cs.IT); Signal Processing (eess.SP)
Low-density parity-check (LDPC) codes form part of the IRIG-106 standard and have been successfully deployed for the Telemetry Group version of shaped-offset quadrature phase shift keying (SOQPSK-TG) modulation. Recently, LDPC code solutions have been proposed and optimized for continuous phase modulations (CPMs), including the pulse code modulation/frequency modulation (PCM/FM) and the multi-h CPM developed by the Advanced Range TeleMetry program (ARTM CPM). These codes were shown to perform around one dB from the respective channel capacities of these modulations. In this paper, we consider the effect of random puncturing of these LDPC codes to further improve spectrum efficiency. We present numerical simulation results that affirm the robust decoding performance promised by LDPC codes designed for ARTM CPM.
- [26] arXiv:2410.05911 (cross-list from cs.LG) [pdf, html, other]
-
Title: Accelerating Error Correction Code TransformersSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Error correction codes (ECC) are crucial for ensuring reliable information transmission in communication systems. Choukroun & Wolf (2022b) recently introduced the Error Correction Code Transformer (ECCT), which has demonstrated promising performance across various transmission channels and families of codes. However, its high computational and memory demands limit its practical applications compared to traditional decoding algorithms. Achieving effective quantization of the ECCT presents significant challenges due to its inherently small architecture, since existing, very low-precision quantization techniques often lead to performance degradation in compact neural networks. In this paper, we introduce a novel acceleration method for transformer-based decoders. We first propose a ternary weight quantization method specifically designed for the ECCT, inducing a decoder with multiplication-free linear layers. We present an optimized self-attention mechanism to reduce computational complexity via codeaware multi-heads processing. Finally, we provide positional encoding via the Tanner graph eigendecomposition, enabling a richer representation of the graph connectivity. The approach not only matches or surpasses ECCT's performance but also significantly reduces energy consumption, memory footprint, and computational complexity. Our method brings transformer-based error correction closer to practical implementation in resource-constrained environments, achieving a 90% compression ratio and reducing arithmetic operation energy consumption by at least 224 times on modern hardware.
- [27] arXiv:2410.06339 (cross-list from cs.LG) [pdf, html, other]
-
Title: Filtered Randomized Smoothing: A New Defense for Robust Modulation ClassificationComments: IEEE Milcom 2024Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Deep Neural Network (DNN) based classifiers have recently been used for the modulation classification of RF signals. These classifiers have shown impressive performance gains relative to conventional methods, however, they are vulnerable to imperceptible (low-power) adversarial attacks. Some of the prominent defense approaches include adversarial training (AT) and randomized smoothing (RS). While AT increases robustness in general, it fails to provide resilience against previously unseen adaptive attacks. Other approaches, such as Randomized Smoothing (RS), which injects noise into the input, address this shortcoming by providing provable certified guarantees against arbitrary attacks, however, they tend to sacrifice accuracy.
In this paper, we study the problem of designing robust DNN-based modulation classifiers that can provide provable defense against arbitrary attacks without significantly sacrificing accuracy. To this end, we first analyze the spectral content of commonly studied attacks on modulation classifiers for the benchmark RadioML dataset. We observe that spectral signatures of un-perturbed RF signals are highly localized, whereas attack signals tend to be spread out in frequency. To exploit this spectral heterogeneity, we propose Filtered Randomized Smoothing (FRS), a novel defense which combines spectral filtering together with randomized smoothing. FRS can be viewed as a strengthening of RS by leveraging the specificity (spectral Heterogeneity) inherent to the modulation classification problem. In addition to providing an approach to compute the certified accuracy of FRS, we also provide a comprehensive set of simulations on the RadioML dataset to show the effectiveness of FRS and show that it significantly outperforms existing defenses including AT and RS in terms of accuracy on both attacked and benign signals. - [28] arXiv:2410.06378 (cross-list from stat.ML) [pdf, html, other]
-
Title: Covering Numbers for Deep ReLU Networks with Applications to Function Approximation and Nonparametric RegressionSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Covering numbers of families of (deep) ReLU networks have been used to characterize their approximation-theoretic performance, upper-bound the prediction error they incur in nonparametric regression, and quantify their classification capacity. These results are based on covering number upper bounds obtained through the explicit construction of coverings. Lower bounds on covering numbers do not seem to be available in the literature. The present paper fills this gap by deriving tight (up to a multiplicative constant) lower and upper bounds on the covering numbers of fully-connected networks with bounded weights, sparse networks with bounded weights, and fully-connected networks with quantized weights. Thanks to the tightness of the bounds, a fundamental understanding of the impact of sparsity, quantization, bounded vs. unbounded weights, and network output truncation can be developed. Furthermore, the bounds allow to characterize the fundamental limits of neural network transformation, including network compression, and lead to sharp upper bounds on the prediction error in nonparametric regression through deep networks. Specifically, we can remove a $\log^6(n)$-factor in the best-known sample complexity rate in the estimation of Lipschitz functions through deep networks thereby establishing optimality. Finally, we identify a systematic relation between optimal nonparametric regression and optimal approximation through deep networks, unifying numerous results in the literature and uncovering general underlying principles.
- [29] arXiv:2410.06843 (cross-list from eess.SP) [pdf, html, other]
-
Title: Point-to-Point MIMO Channel Estimation by Exploiting Array Geometry and Clustered Multipath PropagationComments: 6 pages, 2 figures, accepted to be presented at IEEE GLOBECOM 2024Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
A large-scale MIMO (multiple-input multiple-output) system offers significant advantages in wireless communication, including potential spatial multiplexing and beamforming capabilities. However, channel estimation becomes challenging with multiple antennas at both the transmitter and receiver ends. The minimum mean-squared error (MMSE) estimator, for instance, requires a spatial correlation matrix whose dimensions scale with the square of the product of the number of antennas on the transmitter and receiver sides. This scaling presents a substantial challenge, particularly as antenna counts increase in line with current technological trends. Traditional MIMO literature offers alternative channel estimators that mitigate the need to fully acquire the spatial correlation matrix. In this paper, we revisit point-to-point MIMO channel estimation and introduce a reduced-subspace least squares (RS-LS) channel estimator designed to eliminate physically impossible channel dimensions inherent in uniform planar arrays. Additionally, we propose a cluster-aware RS-LS estimator that leverages both reduced and cluster-specific subspace properties, significantly enhancing performance over the conventional RS-LS approach. Notably, both proposed methods obviate the need for fully/partial knowledge of the spatial correlation matrix.
- [30] arXiv:2410.06884 (cross-list from cs.LG) [pdf, html, other]
-
Title: Adaptive Refinement Protocols for Distributed Distribution Estimation under $\ell^p$-LossesSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST)
Consider the communication-constrained estimation of discrete distributions under $\ell^p$ losses, where each distributed terminal holds multiple independent samples and uses limited number of bits to describe the samples. We obtain the minimax optimal rates of the problem in most parameter regimes. An elbow effect of the optimal rates at $p=2$ is clearly identified. To show the optimal rates, we first design estimation protocols to achieve them. The key ingredient of these protocols is to introduce adaptive refinement mechanisms, which first generate rough estimate by partial information and then establish refined estimate in subsequent steps guided by the rough estimate. The protocols leverage successive refinement, sample compression and thresholding methods to achieve the optimal rates in different parameter regimes. The optimality of the protocols is shown by deriving compatible minimax lower bounds.
- [31] arXiv:2410.06993 (cross-list from cs.LG) [pdf, html, other]
-
Title: Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMaxIvan Butakov, Alexander Sememenko, Alexander Tolmachev, Andrey Gladkov, Marina Munkhoeva, Alexey FrolovComments: 22 pages, 3 fuguresSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
Deep InfoMax (DIM) is a well-established method for self-supervised representation learning (SSRL) based on maximization of the mutual information between the input and the output of a deep neural network encoder. Despite the DIM and contrastive SSRL in general being well-explored, the task of learning representations conforming to a specific distribution (i.e., distribution matching, DM) is still under-addressed. Motivated by the importance of DM to several downstream tasks (including generative modeling, disentanglement, outliers detection and other), we enhance DIM to enable automatic matching of learned representations to a selected prior distribution. To achieve this, we propose injecting an independent noise into the normalized outputs of the encoder, while keeping the same InfoMax training objective. We show that such modification allows for learning uniformly and normally distributed representations, as well as representations of other absolutely continuous distributions. Our approach is tested on various downstream tasks. The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.
Cross submissions (showing 10 of 10 entries)
- [32] arXiv:2209.11405 (replaced) [pdf, html, other]
-
Title: Quantum Locally Testable Code with Constant SoundnessComments: Updated manuscriptSubjects: Information Theory (cs.IT); Quantum Physics (quant-ph)
In this paper, we present two constructions of quantum locally testable codes (QLTC) with constant soundness. In the first approach, we introduce an operation called check product, and show how this operation gives rise to QLTCs of constant soundness, constant rate, and distance scaling with locality. In the second approach, we consider hypergraph product of a quantum code and a classical repetition code, and observe a special case in which the soundness of component codes is preserved. This insight leads us to construct QLTCs of constant soundness, scalable rate and distance, and constant average locality. Our work marks a step towards constructing QLTCs of high soundness and distance, which would give a different construction to the No Low-Energy Trivial States (NLTS) theorem.
- [33] arXiv:2210.00778 (replaced) [pdf, html, other]
-
Title: Lattice-Code Multiple Access: Architecture and Efficient AlgorithmsComments: 13 Pages, 10 figures, 3 tables, partially presented in IEEE Globecom 2023Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
This paper studies a $K$-user lattice-code based multiple-access (LCMA) scheme. Each user equipment (UE) encode its message with a practical lattice code, where we suggest a $2^m$-ary \emph{ring code} with symbol-wise bijective mapping to $2^m$-PAM. The coded-modulated signal is spread with its designated signature sequence, and all $K$ UEs transmit simultaneously. The LCMA receiver choose some integer coefficients, computes the associated $K$ streams of \emph{integer linear combinations} (ILCs) of the UEs' messages, and then reconstruct all UEs' messages from these ILC streams. To execute this, we put forth new efficient LCMA \emph{soft detection} algorithms, which calculate the a posteriori probability of the ILC over the lattice. The complexity is of order no greater than $O(K)$, suitable for massive access of a large $K$. The soft detection outputs are forwarded to $K$ ring-code decoders, which employ $2^m$-ary belief propagation to recover the ILC streams.
To identify the optimal integer coefficients of the ILCs, a new ``%\emph{bounded independent vectors problem}" (BIVP) is established. We then solve this BIVP by developing a new \emph{rate-constraint sphere decoding} algorithm, significantly outperforming existing LLL and HKZ lattice reduction methods. Then, we develop optimized signature sequences of LCMA using a new target-switching steepest descent algorithm. With our developed algorithms, LCMA is shown to support a significantly higher load of UEs and exhibits dramatically improved error rate performance over state-of-the-art multiple access schemes such as interleave-division multiple-access (IDMA) and sparse-code multiple-access (SCMA). The advances are achieved with just parallel processing and $K$ single-user decoding operations, avoiding the implementation issues of successive interference cancelation and iterative detection. - [34] arXiv:2310.10347 (replaced) [pdf, html, other]
-
Title: Editable-DeepSC: Cross-Modal Editable Semantic Communication SystemsComments: published at VTC2024-SpringSubjects: Information Theory (cs.IT)
Different from data-oriented communication systems that primarily focus on how to accurately transmit every bit of data, task-oriented semantic communication systems only transmit the specific semantic information required by downstream tasks, strive to minimize the communication overhead and maintain competitive tasks execution performance in the presence of channel noise. However, it is worth noting that in many scenarios, the transmitted semantic information needs to be dynamically modified according to the users' preferences in a conversational and interactive way, which few existing works take into consideration. In this paper, we propose a novel cross-modal editable semantic communication system, named Editable-DeepSC, to tackle this challenge. By utilizing inversion methods based on StyleGAN priors, Editable-DeepSC takes cross-modal text-image pairs as the inputs and transmits the edited information of images based on textual instructions. Extensive numerical results demonstrate that our proposed Editable-DeepSC can achieve remarkable editing effects and transmission efficiency under the perturbations of channel noise, outperforming existing data-oriented communication methods.
- [35] arXiv:2404.00598 (replaced) [pdf, html, other]
-
Title: Robust Beamforming Design and Antenna Selection for Dynamic HRIS-aided MISO SystemComments: 6 pages, 3 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In this paper, we propose a dynamic hybrid active-passive reconfigurable intelligent surface (HRIS) to enhance multiple-input-single-output (MISO) communications, leveraging the property of dynamically placing active elements. Specifically, considering the impact of hardware impairments (HWIs), we investigate channel-aware configurations of the receive antennas at the base station (BS) and the active/passive elements at the HRIS to improve transmission reliability. To this end, we address the average mean-square-error (MSE) minimization problem for the HRIS-aided MISO system by jointly optimizing the BS receive antenna selection matrix, the reflection phase coefficients, the reflection amplitude matrix, and the mode selection matrix of the HRIS. To overcome the non-convexity and intractability of this problem, we first transform the binary and discrete variables into continuous ones, and then propose a penalty-based exact block coordinate descent (PEBCD) algorithm to alternately solve these subproblems. Numerical simulations demonstrate the significant superiority of our proposed scheme over conventional benchmark schemes.
- [36] arXiv:2406.08916 (replaced) [pdf, html, other]
-
Title: Griesmer type bounds for additive codes over finite fields, integral and fractional MDS codesSubjects: Information Theory (cs.IT); Combinatorics (math.CO)
In this article we prove Griesmer type bounds for additive codes over finite fields. These new bounds give upper bounds on the length of maximum distance separable (MDS) codes, codes which attain the Singleton bound. We will also consider codes to be MDS if they attain the fractional Singleton bound, due to Huffman. We prove that this bound in the fractional case can be obtained by codes whose length surpasses the length of the longest known codes in the integral case. For small parameters, we provide exhaustive computational results for additive MDS codes, by classifying the corresponding (fractional) subspace-arcs. This includes a complete classification of fractional additive MDS codes of size 243 over the field of order 9.
- [37] arXiv:2406.13867 (replaced) [pdf, html, other]
-
Title: Error-Correcting Graph CodesComments: 27 pages, 3 figures, 1 tableSubjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
In this paper, we construct Error-Correcting Graph Codes. An error-correcting graph code of distance $\delta$ is a family $C$ of graphs on a common vertex set of size $n$, such that if we start with any graph in $C$, we would have to modify the neighborhoods of at least $\delta n$ vertices in order to obtain some other graph in $C$. This is a natural graph generalization of the standard Hamming distance error-correcting codes for binary strings. Yohananov and Yaakobi were the first to construct codes in this metric, constructing good codes for $\delta < 1/2$, and optimal codes for a large-alphabet analogue. We extend their work by showing
1. Combinatorial results determining the optimal rate vs. distance trade-off nonconstructively.
2. Graph code analogues of Reed-Solomon codes and code concatenation, leading to positive distance codes for all rates and positive rate codes for all distances.
3. Graph code analogues of dual-BCH codes, yielding large codes with distance $\delta = 1-o(1)$. This gives an explicit ''graph code of Ramsey graphs''.
Several recent works, starting with the paper of Alon, Gujgiczer, Körner, Milojević, and Simonyi, have studied more general graph codes; where the symmetric difference between any two graphs in the code is required to have some desired property. Error-correcting graph codes are a particularly interesting instantiation of this concept. - [38] arXiv:2410.04886 (replaced) [pdf, html, other]
-
Title: High Information Density and Low Coverage Data Storage in DNA with Efficient Channel Coding SchemesYi Ding, Xuan He, Tuan Thanh Nguyen, Wentu Song, Zohar Yakhini, Eitan Yaakobi, Linqiang Pan, Xiaohu Tang, Kui CaiSubjects: Information Theory (cs.IT)
DNA-based data storage has been attracting significant attention due to its extremely high density, low power consumption, and long duration compared to traditional data storage mediums. Despite the recent advancements in DNA data storage technology, significant challenges remain. In particular, various types of errors can occur during the processes of DNA synthesis, storage, and sequencing, including substitution errors, insertion errors, and deletion errors. Furthermore, the entire oligo may be lost. In this work, we report a DNA-based data storage architecture that incorporates efficient channel coding schemes, including different types of error-correcting codes (ECCs) and constrained codes, for both the inner coding and outer coding for the DNA data storage channel. We also carry out large scale experiments to validate our proposed DNA data storage architecture. Specifically, 1.61 and 1.69 MB data are encoded into 30,000 oligos each, with information densities of 1.731 and 1.815, respectively. It has been found that the stored information can be fully recovered without any error at average coverages 4.5 and 6.0, respectively. Compared to previous experimental studies, our architecture achieves higher information density and lower coverage, demonstrating the efficiency of the proposed channel coding schemes.
- [39] arXiv:2302.02575 (replaced) [pdf, html, other]
-
Title: Optimizing Energy-Harvesting Hybrid VLC/RF Networks with Random Receiver OrientationJournal-ref: IEEE Access, 2024Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
This paper investigates an indoor hybrid visible light communication (VLC) and radio frequency (RF) scenario with two-hop downlink transmission. A light emitting diode (LED) transmits both data and energy via VLC to an energy-harvesting relay node, which then uses the harvested energy to retransmit the decoded information to an RF user in the second phase. The design parameters include the direct current (DC) bias and the time allocation for VLC transmission. We formulate an optimization problem to maximize the data rate under decode-and-forward relaying with fixed receiver orientation. The non-convex problem is decomposed into two sub-problems, solved iteratively by fixing one parameter while optimizing the other. Additionally, we analyze the impact of random receiver orientation on the data rate, deriving closed-form expressions for both VLC and RF rates. An exhaustive search approach is employed to solve the optimization, demonstrating that joint optimization of DC bias and time allocation significantly enhances the data rate compared to optimizing DC bias alone.
- [40] arXiv:2405.12046 (replaced) [pdf, other]
-
Title: Energy-Efficient Federated Edge Learning with Streaming Data: A Lyapunov Optimization ApproachJournal-ref: IEEE Transactions on Communications 2024Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Signal Processing (eess.SP)
Federated learning (FL) has received significant attention in recent years for its advantages in efficient training of machine learning models across distributed clients without disclosing user-sensitive data. Specifically, in federated edge learning (FEEL) systems, the time-varying nature of wireless channels introduces inevitable system dynamics in the communication process, thereby affecting training latency and energy consumption. In this work, we further consider a streaming data scenario where new training data samples are randomly generated over time at edge devices. Our goal is to develop a dynamic scheduling and resource allocation algorithm to address the inherent randomness in data arrivals and resource availability under long-term energy constraints. To achieve this, we formulate a stochastic network optimization problem and use the Lyapunov drift-plus-penalty framework to obtain a dynamic resource management design. Our proposed algorithm makes adaptive decisions on device scheduling, computational capacity adjustment, and allocation of bandwidth and transmit power in every round. We provide convergence analysis for the considered setting with heterogeneous data and time-varying objective functions, which supports the rationale behind our proposed scheduling design. The effectiveness of our scheme is verified through simulation results, demonstrating improved learning performance and energy efficiency as compared to baseline schemes.
- [41] arXiv:2405.12109 (replaced) [pdf, html, other]
-
Title: Linguistic Structure from a Bottleneck on Sequential Information ProcessingSubjects: Computation and Language (cs.CL); Information Theory (cs.IT)
Human language is a unique form of communication in the natural world, distinguished by its structured nature. Most fundamentally, it is systematic, meaning that signals can be broken down into component parts that are individually meaningful -- roughly, words -- which are combined in a regular way to form sentences. Furthermore, the way in which these parts are combined maintains a kind of locality: words are usually concatenated together, and they form contiguous phrases, keeping related parts of sentences close to each other. We address the challenge of understanding how these basic properties of language arise from broader principles of efficient communication under information processing constraints. Here we show that natural-language-like systematicity arises in codes that are constrained by predictive information, a measure of the amount of information that must be extracted from the past of a sequence in order to predict its future. In simulations, we show that such codes approximately factorize their source distributions, and then express the resulting factors systematically and locally. Next, in a series of cross-linguistic corpus studies, we show that human languages are structured to have low predictive information at the levels of phonology, morphology, syntax, and semantics. Our result suggests that human language performs a sequential, discrete form of Independent Components Analysis on the statistical distribution over meanings that need to be expressed. It establishes a link between the statistical and algebraic structure of human language, and reinforces the idea that the structure of human language is shaped by communication under cognitive constraints.
- [42] arXiv:2407.09567 (replaced) [pdf, html, other]
-
Title: Where are the bits in atoms? A perspective on the physical origin and evolutionary nature of informationComments: This is a perspective articleSubjects: Neurons and Cognition (q-bio.NC); Information Theory (cs.IT); History and Philosophy of Physics (physics.hist-ph)
This manuscript proposes a conceptual hypothesis regarding the ontology of information, which can serve as a foundation for future empirical exploration and theoretical development. Starting from the premise that information consists of a structural pattern of matter and analysing the current understanding of the evolution of structures and complexity in the universe, we propose to redefine information as a physical structure capable of evolving and driving complexity across multiple layers of self-organization - biological, cultural, civilizational, and cybernetic. The perspective highlights how information structures replicate, vary, and evolve across different domains, speculating on the emergence of a cybernetic layer where machines could evolve autonomously. This interdisciplinary framework challenges traditional views of information and encourages further research into its role in shaping the structures of the universe, offering a new perspective on the evolution of complexity across both natural and artificial systems.
- [43] arXiv:2409.08462 (replaced) [pdf, other]
-
Title: Entropy, cocycles, and their diagrammaticsComments: In v2, expanded Remarks 4.6, 4.8 and added more references. 82 pages, many figuresSubjects: K-Theory and Homology (math.KT); Information Theory (cs.IT); Mathematical Physics (math-ph); Category Theory (math.CT)
The first part of the paper explains how to encode a one-cocycle and a two-cocycle on a group $G$ with values in its representation by networks of planar trivalent graphs with edges labelled by elements of $G$, elements of the representation floating in the regions, and suitable rules for manipulation of these diagrams. When the group is a semidirect product, there is a similar presentation via overlapping networks for the two subgroups involved.
M. Kontsevich and J.-L. Cathelineau have shown how to interpret the entropy of a finite random variable and infinitesimal dilogarithms, including their four-term functional relations, via 2-cocycles on the group of affine symmetries of a line.
We convert their construction into a diagrammatical calculus evaluating planar networks that describe morphisms in suitable monoidal categories. In particular, the four-term relations become equalities of networks analogous to associativity equations. The resulting monoidal categories complement existing categorical and operadic approaches to entropy. - [44] arXiv:2409.13611 (replaced) [pdf, html, other]
-
Title: A generalized Legendre duality relation and Gaussian saturationComments: 45 pages; We fixed some typos, and added some detailed arguments in the end of Section 5Subjects: Functional Analysis (math.FA); Information Theory (cs.IT); Classical Analysis and ODEs (math.CA); Metric Geometry (math.MG)
Motivated by the barycenter problem in optimal transportation theory, Kolesnikov--Werner recently extended the notion of the Legendre duality relation for two functions to the case for multiple functions. We further generalize the duality relation and then establish the centered Gaussian saturation property for a Blaschke--Santaló type inequality associated with it. Our approach to the understanding such a generalized Legendre duality relation is based on our earlier observation that directly links Legendre duality with the inverse Brascamp--Lieb inequality. More precisely, for a large family of degenerate Brascamp--Lieb data, we prove that the centered Gaussian saturation property for the inverse Brascamp--Lieb inequality holds true when inputs are restricted to even and log-concave functions.
As an application to convex geometry, we establish the most important case of a conjecture of Kolesnikov and Werner about the Blaschke--Santaló inequality for multiple even functions as well as multiple symmetric convex bodies. Furthermore, in the direction of information theory and optimal transportation theory, this provides an affirmative answer to another conjecture of Kolesnikov--Werner about a Talagrand type inequality for multiple even probability measures that involves the Wasserstein barycenter. - [45] arXiv:2410.01444 (replaced) [pdf, html, other]
-
Title: Geometric Signatures of Compositionality Across a Language Model's LifetimeComments: Under review as a conference paper at ICLR 2025Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Compositionality, the notion that the meaning of an expression is constructed from the meaning of its parts and syntactic rules, permits the infinite productivity of human language. For the first time, artificial language models (LMs) are able to match human performance in a number of compositional generalization tasks. However, much remains to be understood about the representational mechanisms underlying these abilities. We take a high-level geometric approach to this problem by relating the degree of compositionality in a dataset to the intrinsic dimensionality of its representations under an LM, a measure of feature complexity. We find not only that the degree of dataset compositionality is reflected in representations' intrinsic dimensionality, but that the relationship between compositionality and geometric complexity arises due to learned linguistic features over training. Finally, our analyses reveal a striking contrast between linear and nonlinear dimensionality, showing that they respectively encode formal and semantic aspects of linguistic composition.
- [46] arXiv:2410.02833 (replaced) [pdf, other]
-
Title: Asymmetry of the Relative Entropy in the Regularization of Empirical Risk MinimizationSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
The effect of relative entropy asymmetry is analyzed in the context of empirical risk minimization (ERM) with relative entropy regularization (ERM-RER). Two regularizations are considered: $(a)$ the relative entropy of the measure to be optimized with respect to a reference measure (Type-I ERM-RER); or $(b)$ the relative entropy of the reference measure with respect to the measure to be optimized (Type-II ERM-RER). The main result is the characterization of the solution to the Type-II ERM-RER problem and its key properties. By comparing the well-understood Type-I ERM-RER with Type-II ERM-RER, the effects of entropy asymmetry are highlighted. The analysis shows that in both cases, regularization by relative entropy forces the solution's support to collapse into the support of the reference measure, introducing a strong inductive bias that can overshadow the evidence provided by the training data. Finally, it is shown that Type-II regularization is equivalent to Type-I regularization with an appropriate transformation of the empirical risk function.