Information Theory
See recent articles
- [1] arXiv:2409.09042 [pdf, html, other]
-
Title: Semantic Communication for Cooperative Perception using HARQSubjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)
Cooperative perception, offering a wider field of view than standalone perception, is becoming increasingly crucial in autonomous driving. This perception is enabled through vehicle-to-vehicle (V2V) communication, allowing connected automated vehicles (CAVs) to exchange sensor data, such as light detection and ranging (LiDAR) point clouds, thereby enhancing the collective understanding of the environment. In this paper, we leverage an importance map to distill critical semantic information, introducing a cooperative perception semantic communication framework that employs intermediate fusion. To counter the challenges posed by time-varying multipath fading, our approach incorporates the use of orthogonal frequency-division multiplexing (OFDM) along with channel estimation and equalization strategies. Furthermore, recognizing the necessity for reliable transmission, especially in the low SNR scenarios, we introduce a novel semantic error detection method that is integrated with our semantic communication framework in the spirit of hybrid automatic repeated request (HARQ). Simulation results show that our model surpasses the traditional separate source-channel coding methods in perception performance, both with and without HARQ. Additionally, in terms of throughput, our proposed HARQ schemes demonstrate superior efficiency to the conventional coding approaches.
- [2] arXiv:2409.09426 [pdf, html, other]
-
Title: Cislunar Communication Performance and System Analysis with Uncharted PhenomenaSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
The Moon and its surrounding cislunar space have numerous unknowns, uncertainties, or partially charted phenomena that need to be investigated to determine the extent to which they affect cislunar communication. These include temperature fluctuations, spacecraft distance and velocity dynamics, surface roughness, and the diversity of propagation mechanisms. To develop robust and dynamically operative Cislunar space networks (CSNs), we need to analyze the communication system by incorporating inclusive models that account for the wide range of possible propagation environments and noise characteristics. In this paper, we consider that the communication signal can be subjected to both Gaussian and non-Gaussian noise, but also to different fading conditions. First, we analyze the communication link by showing the relationship between the brightness temperatures of the Moon and the equivalent noise temperature at the receiver of the Lunar Gateway. We propose to analyze the ergodic capacity and the outage probability, as they are essential metrics for the development of reliable communication. In particular, we model the noise with the additive symmetric alpha-stable distribution, which allows a generic analysis for Gaussian and non-Gaussian signal characteristics. Then, we present the closed-form bounds for the ergodic capacity and the outage probability. Finally, the results show the theoretically and operationally achievable performance bounds for the cislunar communication. To give insight into further designs, we also provide our results with comprehensive system settings that include mission objectives as well as orbital and system dynamics.
- [3] arXiv:2409.09525 [pdf, html, other]
-
Title: Foundations of Vision-Based Localization: A New Approach to Localizability Analysis Using Stochastic GeometryComments: 23 pages, 7 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP); Computation (stat.CO)
Despite significant algorithmic advances in vision-based positioning, a comprehensive probabilistic framework to study its performance has remained unexplored. The main objective of this paper is to develop such a framework using ideas from stochastic geometry. Due to limitations in sensor resolution, the level of detail in prior information, and computational resources, we may not be able to differentiate between landmarks with similar appearances in the vision data, such as trees, lampposts, and bus stops. While one cannot accurately determine the absolute target position using a single indistinguishable landmark, obtaining an approximate position fix is possible if the target can see multiple landmarks whose geometric placement on the map is unique. Modeling the locations of these indistinguishable landmarks as a Poisson point process (PPP) $\Phi$ on $\mathbb{R}^2$, we develop a new approach to analyze the localizability in this setting. From the target location $\mathbb{x}$, the measurements are obtained from landmarks within the visibility region. These measurements, including ranges and angles to the landmarks, denoted as $f(\mathbb{x})$, can be treated as mappings from the target location. We are interested in understanding the probability that the measurements $f(\mathbb{x})$ are sufficiently distinct from the measurement $f(\mathbb{x}_0)$ at the given location, which we term localizability. Expressions of localizability probability are derived for specific vision-inspired measurements, such as ranges to landmarks and snapshots of their locations. Our analysis reveals that the localizability probability approaches one when the landmark intensity tends to infinity, which means that error-free localization is achievable in this limiting regime.
- [4] arXiv:2409.09654 [pdf, html, other]
-
Title: A Simple Study on the Optimality of Hybrid NOMASubjects: Information Theory (cs.IT)
The key idea of hybrid non-orthogonal multiple access (NOMA) is to allow users to use the bandwidth resources to which they cannot have access in orthogonal multiple access (OMA) based legacy networks while still guaranteeing its compatibility with the legacy network. However, in a conventional hybrid NOMA network, some users have access to more bandwidth resources than others, which leads to a potential performance loss. So what if the users can access the same amount of bandwidth resources? This letter focuses on a simple two-user scenario, and develops analytical and simulation results to reveal that for this considered scenario, conventional hybrid NOMA is still an optimal transmission strategy.
- [5] arXiv:2409.09715 [pdf, html, other]
-
Title: Generative Semantic Communication via Textual Prompts: Latency Performance TradeoffsMengmeng Ren, Li Qiao, Long Yang, Zhen Gao, Jian Chen, Mahdi Boloursaz Mashhadi, Pei Xiao, Rahim Tafazolli, Mehdi BennisSubjects: Information Theory (cs.IT); Computer Science and Game Theory (cs.GT)
This paper develops an edge-device collaborative Generative Semantic Communications (Gen SemCom) framework leveraging pre-trained Multi-modal/Vision Language Models (M/VLMs) for ultra-low-rate semantic communication via textual prompts. The proposed framework optimizes the use of M/VLMs on the wireless edge/device to generate high-fidelity textual prompts through visual captioning/question answering, which are then transmitted over a wireless channel for SemCom. Specifically, we develop a multi-user Gen SemCom framework using pre-trained M/VLMs, and formulate a joint optimization problem of prompt generation offloading, communication and computation resource allocation to minimize the latency and maximize the resulting semantic quality. Due to the nonconvex nature of the problem with highly coupled discrete and continuous variables, we decompose it as a two-level problem and propose a low-complexity swap/leaving/joining (SLJ)-based matching algorithm. Simulation results demonstrate significant performance improvements over the conventional semanticunaware/non-collaborative offloading benchmarks.
- [6] arXiv:2409.09732 [pdf, html, other]
-
Title: Ten Years of Research Advances in Full-Duplex Massive MIMOComments: Accepted in IEEE Transactions on CommunicationsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
We present an overview of ongoing research endeavors focused on in-band full-duplex (IBFD) massive multiple-input multiple-output (MIMO) systems and their applications. In response to the unprecedented demands for mobile traffic in concurrent and upcoming wireless networks, a paradigm shift from conventional cellular networks to distributed communication systems becomes imperative. Cell-free massive MIMO (CF-mMIMO) emerges as a practical and scalable implementation of distributed/network MIMO systems, serving as a crucial physical layer technology for the advancement of next-generation wireless networks. This architecture inherits benefits from co-located massive MIMO and distributed systems and provides the flexibility for integration with the IBFD technology. We delineate the evolutionary trajectory of cellular networks, transitioning from conventional half-duplex multi-user MIMO networks to IBFD CF-mMIMO. The discussion extends further to the emerging paradigm of network-assisted IBFD CF-mMIMO (NAFD CF-mMIMO), serving as an energy-efficient prototype for asymmetric uplink and downlink communication services. This novel approach finds applications in dual-functionality scenarios, including simultaneous wireless power and information transmission, wireless surveillance, and integrated sensing and communications. We highlight various current use case applications, discuss open challenges, and outline future research directions aimed at fully realizing the potential of NAFD CF-mMIMO systems to meet the evolving demands of future wireless networks.
- [7] arXiv:2409.09780 [pdf, html, other]
-
Title: Power Allocation for Finite-Blocklength IR-HARQSubjects: Information Theory (cs.IT)
This letter concerns the power allocation across the multiple transmission rounds under the Incremental Redundancy Hybrid Automatic Repeat reQuest (IR-HARQ) policy, in pursuit of an energy-efficient way of fulfilling the outage probability target in the finite-blocklength regime. We start by showing that the optimization objective and the constraints of the above power allocation problem all depend upon the outage probability. The main challenge then lies in the fact that the outage probability cannot be written analytically in terms of the power variables. To sidestep this difficulty, we propose a novel upper bound on the outage probability in the finite-blocklength regime, which is much tighter than the existing ones from the literature. Most importantly, by using this upper bound to approximate the outage probability, we can recast the original intractable power allocation problem into a geometric programming (GP) form--which can be efficiently solved by the standard method.
- [8] arXiv:2409.09982 [pdf, html, other]
-
Title: Atomic Norm Minimization-based DoA Estimation for IRS-assisted Sensing SystemsComments: accepted by WCLSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Intelligent reflecting surface (IRS) is expected to play a pivotal role in future wireless sensing networks owing to its potential for high-resolution and high-accuracy sensing. In this work, we investigate a multi-target direction-of-arrival (DoA) estimation problem in a semi-passive IRS-assisted sensing system, where IRS reflecting elements (REs) reflect signals from the base station to targets, and IRS sensing elements (SEs) estimate DoA based on echo signals reflected by the targets. {First of all, instead of solely relying on IRS SEs for DoA estimation as done in the existing literature, this work fully exploits the DoA information embedded in both IRS REs and SEs matrices via the atomic norm minimization (ANM) scheme. Subsequently, the Cramér-Rao bound for DoA estimation is derived, revealing an inverse proportionality to $MN^3+NM^3$ under the case of identity covariance matrix of the IRS measurement matrix and a single target, where $M$ and $N$ are the number of IRS SEs and REs, respectively. Finally, extensive numerical results substantiate the superior accuracy and resolution performance of the proposed ANM-based DoA estimation method over representative baselines.
- [9] arXiv:2409.10112 [pdf, html, other]
-
Title: On the Bit Error Probability of DMA-Based SystemsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Dynamic metasurface antennas (DMAs) are an alternative application of metasurfaces as active reconfigurable antennas with advanced analog signal processing and beamforming capabilities, which have been proposed to replace conventional antenna arrays for next generation transceivers. Motivated by this, we investigate the bit error probability (BEP) optimization in a DMA-based system, propose an iterative optimization algorithm, which adjusts the transmit precoder and the weights of the DMA elements, prove its convergence and derive complexity.
- [10] arXiv:2409.10127 [pdf, html, other]
-
Title: Joint Beamforming and Illumination Pattern Design for Beam-Hopping LEO Satellite CommunicationsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Since hybrid beamforming (HBF) can approach the performance of fully-digital beamforming (FDBF) with much lower hardware complexity, we investigate the HBF design for beam-hopping (BH) low earth orbit (LEO) satellite communications (SatComs). Aiming at maximizing the sum-rate of totally illuminated beam positions during the whole BH period, we consider joint beamforming and illumination pattern design subject to the HBF constraints and sum-rate requirements. To address the non-convexity of the HBF constraints, we temporarily replace the HBF constraints with the FDBF constraints. Then we propose an FDBF and illumination pattern random search (FDBF-IPRS) scheme to optimize illumination patterns and fully-digital beamformers using constrained random search and fractional programming methods. To further reduce the computational complexity, we propose an FDBF and illumination pattern alternating optimization (FDBF-IPAO) scheme, where we relax the integer illumination pattern to continuous variables and after finishing all the iterations we quantize the continuous variables into integer ones. Based on the fully-digital beamformers designed by the FDBF-IPRS or FDBF-IPAO scheme, we propose an HBF alternating minimization algorithm to design the hybrid beamformers. Simulation results show that the proposed schemes can achieve satisfactory sum-rate performance for BH LEO SatComs.
- [11] arXiv:2409.10314 [pdf, html, other]
-
Title: Rate-Splitting Multiple Access for Coexistence of Semantic and Bit CommunicationsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In the sixth generation (6G) of cellular networks, the demands for capacity and connectivity will increase dramatically to meet the requirements of emerging services for both humans and machines. Semantic communication has shown great potential because of its efficiency, and suitability for users who only care about the semantic meaning. But bit communication is still needed for users requiring original messages. Therefore, there will be a coexistence of semantic and bit communications in future networks. This motivates us to explore how to allocate resources in such a coexistence scenario. We investigate different uplink multiple access (MA) schemes for the coexistence of semantic users and a bit user, namely orthogonal multiple access (OMA), non-orthogonal multiple access (NOMA) and rate-splitting multiple access (RSMA). We characterize the rate regions achieved by those MA schemes. The simulation results show that RSMA always outperforms NOMA and has better performance in high semantic rate regimes compared to OMA. We find that RSMA scheme design, rate region, and power allocation are quite different in the coexistence scenario compared to the bit-only communication, primarily due to the need to consider the understandability in semantic communications. Interestingly, in contrast to bit-only communications where RSMA is capacity achieving without any need for time sharing, in the coexistence scenario, time sharing helps enlarging RSMA rate region.
- [12] arXiv:2409.10410 [pdf, html, other]
-
Title: Estimates for Optimal Multistage Group Partition TestingSubjects: Information Theory (cs.IT)
In multistage group testing, the tests within the same stage are considered nonadaptive, while those conducted across different stages are adaptive. Specifically, when the pools within the same stage are disjoint, meaning that the entire set is divided into several disjoint subgroups, it is referred to as a multistage group partition testing problem, denoted as the (n, d, s) problem, where n, d, and s represent the total number of items, defectives, and stages respectively. This paper presents exact solutions for the (n, 1, s) and (n, d, 2) problems for the first time. Additionally, a general dynamic programming approach is developed for the (n, d, s) problem. Significantly I give the sharp upper and lower bounds estimates. If the defective number in unknown but bounded, I can provide an algorithm with an optimal competitive ratio in the asymptotic sense. While assuming the prior distribution of the defective items, I also establish a well performing upper and lower bound estimate to the expectation of optimal strategy
- [13] arXiv:2409.10420 [pdf, html, other]
-
Title: The Second Generalized Covering Radius of Binary Primitive Double-Error-Correcting BCH CodesSubjects: Information Theory (cs.IT)
We completely determine the second covering radius for binary primitive double-error-correcting BCH codes. As part of this process, we provide a lower bound on the second covering radius for binary primitive BCH codes correcting more than two errors.
- [14] arXiv:2409.10511 [pdf, html, other]
-
Title: Weak Superimposed Codes of Improved Asymptotic Rate and Their Randomized ConstructionComments: 6 pages, accepted for presentation at the 2022 IEEE International Symposium on Information Theory (ISIT)Journal-ref: Proc. IEEE International Symposium on Information Theory (ISIT), pp. 784-789, 2022Subjects: Information Theory (cs.IT)
Weak superimposed codes are combinatorial structures related closely to generalized cover-free families, superimposed codes, and disjunct matrices in that they are only required to satisfy similar but less stringent conditions. This class of codes may also be seen as a stricter variant of what are known as locally thin families in combinatorics. Originally, weak superimposed codes were introduced in the context of multimedia content protection against illegal distribution of copies under the assumption that a coalition of malicious users may employ the averaging attack with adversarial noise. As in many other kinds of codes in information theory, it is of interest and importance in the study of weak superimposed codes to find the highest achievable rate in the asymptotic regime and give an efficient construction that produces an infinite sequence of codes that achieve it. Here, we prove a tighter lower bound than the sharpest known one on the rate of optimal weak superimposed codes and give a polynomial-time randomized construction algorithm for codes that asymptotically attain our improved bound with high probability. Our probabilistic approach is versatile and applicable to many other related codes and arrays.
New submissions for Tuesday, 17 September 2024 (showing 14 of 14 entries )
- [15] arXiv:2409.09060 (cross-list from math.FA) [pdf, html, other]
-
Title: Noncommutative Donoho-Elad-Gribonval-Nielsen-Fuchs Sparsity TheoremComments: 7 Pages, 0 FiguresSubjects: Functional Analysis (math.FA); Information Theory (cs.IT); Operator Algebras (math.OA); Optimization and Control (math.OC); Statistics Theory (math.ST)
Breakthrough Sparsity Theorem, derived independently by Donoho and Elad \textit{[Proc. Natl. Acad. Sci. USA, 2003]}, Gribonval and Nielsen \textit{[IEEE Trans. Inform. Theory, 2003]} and Fuchs \textit{[IEEE Trans. Inform. Theory, 2004]} says that unique sparse solution to NP-Hard $\ell_0$-minimization problem can be obtained using unique solution of P-Type $\ell_1$-minimization problem. In this paper, we derive noncommutative version of their result using frames for Hilbert C*-modules.
- [16] arXiv:2409.10123 (cross-list from eess.SP) [pdf, html, other]
-
Title: Wavenumber-Domain Near-Field Channel Estimation: Beyond the Fresnel BoundComments: This paper has been accepted by IEEE Globecom 2024Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
In the near-field context, the Fresnel approximation is typically employed to mathematically represent solvable functions of spherical waves. However, these efforts may fail to take into account the significant increase in the lower limit of the Fresnel approximation, known as the Fresnel distance. The lower bound of the Fresnel approximation imposes a constraint that becomes more pronounced as the array size grows. Beyond this constraint, the validity of the Fresnel approximation is broken. As a potential solution, the wavenumber-domain paradigm characterizes the spherical wave using a spectrum composed of a series of linear orthogonal bases. However, this approach falls short of covering the effects of the array geometry, especially when using Gaussian-mixed-model (GMM)-based von Mises-Fisher distributions to approximate all spectra. To fill this gap, this paper introduces a novel wavenumber-domain ellipse fitting (WDEF) method to tackle these challenges. Particularly, the channel is accurately estimated in the near-field region, by maximizing the closed-form likelihood function of the wavenumber-domain spectrum conditioned on the scatterers' geometric parameters. Simulation results are provided to demonstrate the robustness of the proposed scheme against both the distance and angles of arrival.
- [17] arXiv:2409.10226 (cross-list from cs.DC) [pdf, html, other]
-
Title: Privacy-Preserving Distributed Maximum Consensus Without Accuracy LossSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Information Theory (cs.IT); Signal Processing (eess.SP)
In distributed networks, calculating the maximum element is a fundamental task in data analysis, known as the distributed maximum consensus problem. However, the sensitive nature of the data involved makes privacy protection essential. Despite its importance, privacy in distributed maximum consensus has received limited attention in the literature. Traditional privacy-preserving methods typically add noise to updates, degrading the accuracy of the final result. To overcome these limitations, we propose a novel distributed optimization-based approach that preserves privacy without sacrificing accuracy. Our method introduces virtual nodes to form an augmented graph and leverages a carefully designed initialization process to ensure the privacy of honest participants, even when all their neighboring nodes are dishonest. Through a comprehensive information-theoretical analysis, we derive a sufficient condition to protect private data against both passive and eavesdropping adversaries. Extensive experiments validate the effectiveness of our approach, demonstrating that it not only preserves perfect privacy but also maintains accuracy, outperforming existing noise-based methods that typically suffer from accuracy loss.
Cross submissions for Tuesday, 17 September 2024 (showing 3 of 3 entries )
- [18] arXiv:1801.10491 (replaced) [pdf, html, other]
-
Title: Interactive Nearest Lattice Point Search in a Distributed Setting: Two DimensionsComments: Extended (journal version) of a previous submission. arXiv admin note: text overlap with arXiv:1701.08458Journal-ref: Published in IEEE Transactions on Communications, v. 70, n. 8, August 2022Subjects: Information Theory (cs.IT)
The nearest lattice point problem in $\mathbb{R}^n$ is formulated in a distributed network with $n$ nodes. The objective is to minimize the probability that an incorrect lattice point is found, subject to a constraint on inter-node communication. Algorithms with a single as well as an unbounded number of rounds of communication are considered for the case $n=2$. For the algorithm with a single round, expressions are derived for the error probability as a function of the total number of communicated bits. We observe that the error exponent depends on the lattice structure and that zero error requires an infinite number of communicated bits. In contrast, with an infinite number of allowed communication rounds, the nearest lattice point can be determined without error with a finite average number of communicated bits and a finite average number of rounds of communication. In two dimensions, the hexagonal lattice, which is most efficient for communication and compression, is found to be the most expensive in terms of communication cost.
- [19] arXiv:2208.10704 (replaced) [pdf, html, other]
-
Title: Delay Minimization for Hybrid BAC-NOMA Offloading in MEC NetworksSubjects: Information Theory (cs.IT)
This paper studies the offloading service improvement of multi-access edge computing (MEC) based on backscatter communication (BackCom) assisted non-orthogonal multiple access (BAC-NOMA). A hybrid BAC-NOMA protocol is proposed, where the uplink users backscatter their tasks by leveraging the downlink signal, and then the remaining data is transmitted through uplink NOMA. In particular, a resource allocation problem is formulated to minimize the offloading delay of uplink users. The non-convex problem is transformed into a convex problem, and an iterative algorithm is developed accordingly. Simulation results show that the proposed protocol outperforms the benchmark in terms of offloading delay.
- [20] arXiv:2404.18088 (replaced) [pdf, html, other]
-
Title: On completely regular self-dual codes with covering radius $\rho \leq 3$Subjects: Information Theory (cs.IT)
We give a complete classification of self-dual completely regular codes with covering radius $\rho \leq 3$. For $\rho=1$ the results are almost trivial. For $\rho=2$, by using properties of the more general class of uniformly packed codes in the wide sense, we show that there are two sporadic such codes, of length $8$, and an infinite family, of length $4$, apart from the direct sum of two self-dual completely regular codes with $\rho=1$, each one. For $\rho=3$, in some cases, we use similar techniques to the ones used for $\rho=2$. However, for some other cases we use different methods, namely, the Pless power moments which allow to us to discard several possibilities. We show that there are only two self-dual completely regular codes with $\rho=3$ and $d\geq 3$, which are both ternary: the extended ternary Golay code and the direct sum of three ternary Hamming codes of length 4. Therefore, any self-dual completely regular code with $d\geq 3$ and $\rho=3$ is ternary and has length 12.
We provide the intersection arrays for all such codes. - [21] arXiv:2406.03438 (replaced) [pdf, html, other]
-
Title: CSI-GPT: Integrating Generative Pre-Trained Transformer with Federated-Tuning to Acquire Downlink Massive MIMO ChannelsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In massive multiple-input multiple-output (MIMO) systems, how to reliably acquire downlink channel state information (CSI) with low overhead is challenging. In this work, by integrating the generative pre-trained Transformer (GPT) with federated-tuning, we propose a CSI-GPT approach to realize efficient downlink CSI acquisition. Specifically, we first propose a Swin Transformer-based channel acquisition network (SWTCAN) to acquire downlink CSI, where pilot signals, downlink channel estimation, and uplink CSI feedback are jointly designed. Furthermore, to solve the problem of insufficient training data, we propose a variational auto-encoder-based channel sample generator (VAE-CSG), which can generate sufficient CSI samples based on a limited number of high-quality CSI data obtained from the current cell. The CSI dataset generated from VAE-CSG will be used for pre-training SWTCAN. To fine-tune the pre-trained SWTCAN for improved performance, we propose an online federated-tuning method, where only a small amount of SWTCAN parameters are unfrozen and updated using over-the-air computation, avoiding the high communication overhead caused by aggregating the complete CSI samples from user equipment (UEs) to the BS for centralized fine-tuning. Simulation results verify the advantages of the proposed SWTCAN and the communication efficiency of the proposed federated-tuning method. Our code is publicly available at this https URL
- [22] arXiv:2409.05314 (replaced) [pdf, html, other]
-
Title: Tele-LLMs: A Series of Specialized Large Language Models for TelecommunicationsSubjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The emergence of large language models (LLMs) has significantly impacted various fields, from natural language processing to sectors like medicine and finance. However, despite their rapid proliferation, the applications of LLMs in telecommunications remain limited, often relying on general-purpose models that lack domain-specific specialization. This lack of specialization results in underperformance, particularly when dealing with telecommunications-specific technical terminology and their associated mathematical representations. This paper addresses this gap by first creating and disseminating Tele-Data, a comprehensive dataset of telecommunications material curated from relevant sources, and Tele-Eval, a large-scale question-and-answer dataset tailored to the domain. Through extensive experiments, we explore the most effective training techniques for adapting LLMs to the telecommunications domain, ranging from examining the division of expertise across various telecommunications aspects to employing parameter-efficient techniques. We also investigate how models of different sizes behave during adaptation and analyze the impact of their training data on this behavior. Leveraging these findings, we develop and open-source Tele-LLMs, the first series of language models ranging from 1B to 8B parameters, specifically tailored for telecommunications. Our evaluations demonstrate that these models outperform their general-purpose counterparts on Tele-Eval while retaining their previously acquired capabilities, thus avoiding the catastrophic forgetting phenomenon.
- [23] arXiv:2205.04641 (replaced) [pdf, other]
-
Title: On Causality in Domain Adaptation and Semi-Supervised Learning: an Information-Theoretic Analysis for Parametric ModelsComments: 56 pages Including AppendixSubjects: Machine Learning (cs.LG); Information Theory (cs.IT)
Recent advancements in unsupervised domain adaptation (UDA) and semi-supervised learning (SSL), particularly incorporating causality, have led to significant methodological improvements in these learning problems. However, a formal theory that explains the role of causality in the generalization performance of UDA/SSL is still lacking. In this paper, we consider the UDA/SSL scenarios where we access $m$ labelled source data and $n$ unlabelled target data as training instances under different causal settings with a parametric probabilistic model. We study the learning performance (e.g., excess risk) of prediction in the target domain from an information-theoretic perspective. Specifically, we distinguish two scenarios: the learning problem is called causal learning if the feature is the cause and the label is the effect, and is called anti-causal learning otherwise. We show that in causal learning, the excess risk depends on the size of the source sample at a rate of $O(\frac{1}{m})$ only if the labelling distribution between the source and target domains remains unchanged. In anti-causal learning, we show that the unlabelled data dominate the performance at a rate of typically $O(\frac{1}{n})$. These results bring out the relationship between the data sample size and the hardness of the learning problem with different causal mechanisms.
- [24] arXiv:2306.08737 (replaced) [pdf, html, other]
-
Title: A Networked Multi-Agent System for Mobile Wireless Infrastructure on DemandMiguel Calvo-Fullana, Mikhail Gerasimenko, Daniel Mox, Leopoldo Agorio, Mariana del Castillo, Vijay Kumar, Alejandro Ribeiro, Juan Andres BazerqueSubjects: Robotics (cs.RO); Information Theory (cs.IT); Optimization and Control (math.OC)
Despite the prevalence of wireless connectivity in urban areas around the globe, there remain numerous and diverse situations where connectivity is insufficient or unavailable. To address this, we introduce mobile wireless infrastructure on demand, a system of UAVs that can be rapidly deployed to establish an ad-hoc wireless network. This network has the capability of reconfiguring itself dynamically to satisfy and maintain the required quality of communication. The system optimizes the positions of the UAVs and the routing of data flows throughout the network to achieve this quality of service (QoS). By these means, task agents using the network simply request a desired QoS, and the system adapts accordingly while allowing them to move freely. We have validated this system both in simulation and in real-world experiments. The results demonstrate that our system effectively offers mobile wireless infrastructure on demand, extending the operational range of task agents and supporting complex mobility patterns, all while ensuring connectivity and being resilient to agent failures.
- [25] arXiv:2312.16341 (replaced) [pdf, html, other]
-
Title: Harnessing the Power of Federated Learning in Federated Contextual BanditsComments: Accepted to Transactions on Machine Learning Research (07/2024); a preliminary version appeared in the Multi-Agent Security Workshop at NeurIPS 2023Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Federated learning (FL) has demonstrated great potential in revolutionizing distributed machine learning, and tremendous efforts have been made to extend it beyond the original focus on supervised learning. Among many directions, federated contextual bandits (FCB), a pivotal integration of FL and sequential decision-making, has garnered significant attention in recent years. Despite substantial progress, existing FCB approaches have largely employed their tailored FL components, often deviating from the canonical FL framework. Consequently, even renowned algorithms like FedAvg remain under-utilized in FCB, let alone other FL advancements. Motivated by this disconnection, this work takes one step towards building a tighter relationship between the canonical FL study and the investigations on FCB. In particular, a novel FCB design, termed FedIGW, is proposed to leverage a regression-based CB algorithm, i.e., inverse gap weighting. Compared with existing FCB approaches, the proposed FedIGW design can better harness the entire spectrum of FL innovations, which is concretely reflected as (1) flexible incorporation of (both existing and forthcoming) FL protocols; (2) modularized plug-in of FL analyses in performance guarantees; (3) seamless integration of FL appendages (such as personalization, robustness, and privacy). We substantiate these claims through rigorous theoretical analyses and empirical evaluations.
- [26] arXiv:2402.02001 (replaced) [pdf, html, other]
-
Title: PANDA: Query Evaluation in Submodular WidthSubjects: Databases (cs.DB); Information Theory (cs.IT)
In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe PANDA, an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes answers to a conjunctive query in time given by the the submodular width plus the output size of the query. The version in this paper represents a significant simplification of the original version [ANS, PODS'17].
- [27] arXiv:2406.11569 (replaced) [pdf, html, other]
-
Title: Pre-Training and Personalized Fine-Tuning via Over-the-Air Federated Meta-Learning: Convergence-Generalization Trade-OffsComments: 39 pages, 7 figures, submitted for possible journal publicationSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP)
For modern artificial intelligence (AI) applications such as large language models (LLMs), the training paradigm has recently shifted to pre-training followed by fine-tuning. Furthermore, owing to dwindling open repositories of data and thanks to efforts to democratize access to AI models, pre-training is expected to increasingly migrate from the current centralized deployments to federated learning (FL) implementations. Meta-learning provides a general framework in which pre-training and fine-tuning can be formalized. Meta-learning-based personalized FL (meta-pFL) moves beyond basic personalization by targeting generalization to new agents and tasks. This paper studies the generalization performance of meta-pFL for a wireless setting in which the agents participating in the pre-training phase, i.e., meta-learning, are connected via a shared wireless channel to the server. Adopting over-the-air computing, we study the trade-off between generalization to new agents and tasks, on the one hand, and convergence, on the other hand. The trade-off arises from the fact that channel impairments may enhance generalization, while degrading convergence. Extensive numerical results validate the theory.
- [28] arXiv:2409.00797 (replaced) [pdf, html, other]
-
Title: Leveraging parallelizability and channel structure in THz-band, Tbps channel-code decodingSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
As advancements close the gap between current device capabilities and the requirements for terahertz (THz)-band communications, the demand for terabit-per-second (Tbps) circuits is on the rise. This paper addresses the challenge of achieving Tbps data rates in THz-band communications by focusing on the baseband computation bottleneck. We propose leveraging parallel processing and pseudo-soft information (PSI) across multicarrier THz channels for efficient channel code decoding. We map bits to transmission resources using shorter code-words to enhance parallelizability and reduce complexity. Additionally, we integrate channel state information into PSI to alleviate the processing overhead of soft decoding. Results demonstrate that PSI-aided decoding of 64-bit code-words halves the complexity of 128-bit hard decoding under comparable effective rates, while introducing a 4 dB gain at a $10^{-3}$ block error rate. The proposed scheme approximates soft decoding with significant complexity reduction at a graceful performance cost.