Search | arXiv e-print repository

Mini-Batch Gradient-Based MCMC for Decentralized Massive MIMO Detection

Authors: Xingyu Zhou, Le Liang, Jing Zhang, Chao-Kai Wen, Shi Jin

Abstract: Massive multiple-input multiple-output (MIMO) technology has significantly enhanced spectral and power efficiency in cellular communications and is expected to further evolve towards extra-large-scale MIMO. However, centralized processing for massive MIMO faces practical obstacles, including excessive computational complexity and a substantial volume of baseband data to be exchanged. To address th… ▽ More Massive multiple-input multiple-output (MIMO) technology has significantly enhanced spectral and power efficiency in cellular communications and is expected to further evolve towards extra-large-scale MIMO. However, centralized processing for massive MIMO faces practical obstacles, including excessive computational complexity and a substantial volume of baseband data to be exchanged. To address these challenges, decentralized baseband processing has emerged as a promising solution. This approach involves partitioning the antenna array into clusters with dedicated computing hardware for parallel processing. In this paper, we investigate the gradient-based Markov chain Monte Carlo (MCMC) method -- an advanced MIMO detection technique known for its near-optimal performance in centralized implementation -- within the context of a decentralized baseband processing architecture. This decentralized design mitigates the computation burden at a single processing unit by utilizing computational resources in a distributed and parallel manner. Additionally, we integrate the mini-batch stochastic gradient descent method into the proposed decentralized detector, achieving remarkable performance with high efficiency. Simulation results demonstrate substantial performance gains of the proposed method over existing decentralized detectors across various scenarios. Moreover, complexity analysis reveals the advantages of the proposed decentralized strategy in terms of computation delay and interconnection bandwidth when compared to conventional centralized detectors. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: 15 pages, 10 figures, 1 tables. This paper has been accepted for publication by the IEEE Transactions on Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2407.16910 [pdf]

Operando probing of nanocracking in CuO-derived Cu during CO$_2$ electroreduction

Authors: Jiawei Wan, Ershuai Liu, Woong Choi, Jiayun Liang, Buyu Zhang, Keon-Han Kim, Xianhu Sun, Meng Zhang, Han Xue, Yi Chen, Qiubo Zhang, Changlian Wen, Ji Yang, Karen C. Bustillo, Peter Ercius, Denis Leshchev, Ji Su, Zakaria Y. Al Balushi, Adam Z. Weber, Mark Asta, Alexis T. Bell, Walter S. Drisdell, Haimei Zheng

Abstract: Identifying and controlling active sites in electrocatalysis remains a grand challenge due to restructuring of catalysts in the complex chemical environments during operation. Inactive precatalysts can transform into active catalysts under reaction conditions, such as oxide-derived Cu (OD-Cu) for CO$_2$ electroreduction displaying improved production of multicarbon (C$_{2+}$) chemicals. Revealing… ▽ More Identifying and controlling active sites in electrocatalysis remains a grand challenge due to restructuring of catalysts in the complex chemical environments during operation. Inactive precatalysts can transform into active catalysts under reaction conditions, such as oxide-derived Cu (OD-Cu) for CO$_2$ electroreduction displaying improved production of multicarbon (C$_{2+}$) chemicals. Revealing the mechanism of active site origin in OD-Cu catalysts requires in situ/operando characterizations of structure, morphology, and valence state evolution with high spatial and temporal resolution. Applying newly developed electrochemical liquid cell transmission electron microscopy combined with X-ray absorption spectroscopy, our multimodal operando techniques unveil the formation pathways of OD-Cu active sites from CuO bicrystal nanowire precatalysts. Rapid reduction of CuO directly to Cu within 60 seconds generates a nanocrack network throughout the nanowire, via formation of "boundary nanocracks" along the twin boundary and "transverse nanocracks" propagating from the surface to the center of the nanowire. The nanocrack network further reconstructs, leading to a highly porous structure rich in Cu nanograins, with a boosted specific surface area and density of active sites for C$_{2+}$ products. These findings suggest a means to optimize active OD-Cu nanostructures through nanocracking by tailoring grain boundaries in CuO precatalysts. More generally, our advanced operando approach opens new opportunities for mechanistic insights to enable improved control of catalyst structure and performance. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.08813 [pdf, other]

FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

Authors: Yu Tian, Congcong Wen, Min Shi, Muhammad Muneeb Afzal, Hao Huang, Muhammad Osama Khan, Yan Luo, Yi Fang, Mengyu Wang

Abstract: Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., di… ▽ More Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., different retinal imaging modalities) for patient diagnosis. This paper presents FairDomain, a pioneering systemic study into algorithmic fairness under domain shifts, employing state-of-the-art domain adaptation (DA) and generalization (DG) algorithms for both medical segmentation and classification tasks to understand how biases are transferred between different domains. We also introduce a novel plug-and-play fair identity attention (FIA) module that adapts to various DA and DG algorithms to improve fairness by using self-attention to adjust feature importance based on demographic attributes. Additionally, we curate the first fairness-focused dataset with two paired imaging modalities for the same patient cohort on medical segmentation and classification tasks, to rigorously assess fairness in domain-shift scenarios. Excluding the confounding impact of demographic distribution variation between source and target domains will allow clearer quantification of the performance of domain transfer models. Our extensive evaluations reveal that the proposed FIA significantly enhances both model performance accounted for fairness across all domain shift settings (i.e., DA and DG) with respect to different demographics, which outperforms existing methods on both segmentation and classification. The code and data can be accessed at https://ophai.hms.harvard.edu/datasets/harvard-fairdomain20k. △ Less

Submitted 18 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: ECCV 2024; Codes and datasets are available at https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain

arXiv:2407.06042 [pdf, ps, other]

Near-Optimal MIMO Detection Using Gradient-Based MCMC in Discrete Spaces

Authors: Xingyu Zhou, Le Liang, Jing Zhang, Chao-Kai Wen, Shi Jin

Abstract: The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing… ▽ More The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing gradient-based MCMC detectors are heuristically designed and thus are theoretically untenable. To bridge this gap, we introduce a novel sampling algorithm tailored for discrete spaces. This algorithm leverages gradients from the underlying continuous spaces for acceleration while maintaining the validity of probabilistic sampling. We prove the convergence of this method and also analyze its convergence rate using both MCMC theory and empirical diagnostics. On this basis, we develop a MIMO detector that precisely samples from the target discrete distribution and generates posterior Bayesian estimates using these samples, whose performance is thereby theoretically guaranteed. Furthermore, our proposed detector is highly parallelizable and scalable to large MIMO dimensions, positioning it as a compelling candidate for next-generation wireless networks. Simulation results show that our detector achieves near-optimal performance, significantly outperforms state-of-the-art baselines, and showcases resilience to various system setups. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2407.05905 [pdf, other]

Deep Learning-based CSI Feedback in Wi-Fi Systems

Authors: Fan Qi, Jiajia Guo, Yiming Cui, Xiangyi Li, Chao-Kai Wen, Shi Jin

Abstract: In Wi-Fi systems, channel state information (CSI) plays a crucial role in enabling access points to execute beamforming operations. However, the feedback overhead associated with CSI significantly hampers the throughput improvements. Recent advancements in deep learning (DL) have transformed the approach to CSI feedback in cellular systems. Drawing inspiration from the successes witnessed in the r… ▽ More In Wi-Fi systems, channel state information (CSI) plays a crucial role in enabling access points to execute beamforming operations. However, the feedback overhead associated with CSI significantly hampers the throughput improvements. Recent advancements in deep learning (DL) have transformed the approach to CSI feedback in cellular systems. Drawing inspiration from the successes witnessed in the realm of mobile communications, this paper introduces a DL-based CSI feedback framework, named EFNet, tailored for Wi-Fi systems. The proposed framework leverages an autoencoder to achieve precise feedback with minimal overhead. The process involves the station utilizing the encoder to compress and quantize a series of matrices into codeword bit streams, which are then fed back to the access point. Subsequently, the decoder installed at the AP reconstructs beamforming matrices from these bit streams. We implement the EFNet system using standard Wi-Fi equipment operating in the 2.4 GHz band. Experimental findings in an office environment reveal a remarkable 80.77% reduction in feedback overhead compared to the 802.11ac standard, alongside a significant boost in net throughput of up to 30.72%. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.02919 [pdf, other]

doi 10.1109/JIOT.2024.3421577

Efficient IoT Devices Localization Through Wi-Fi CSI Feature Fusion and Anomaly Detection

Authors: Yan Li, Jie Yang, Shang-Ling Shih, Wan-Ting Shih, Chao-Kai Wen, Shi Jin

Abstract: Internet of Things (IoT) device localization is fundamental to smart home functionalities, including indoor navigation and tracking of individuals. Traditional localization relies on relative methods utilizing the positions of anchors within a home environment, yet struggles with precision due to inherent inaccuracies in these anchor positions. In response, we introduce a cutting-edge smartphone-b… ▽ More Internet of Things (IoT) device localization is fundamental to smart home functionalities, including indoor navigation and tracking of individuals. Traditional localization relies on relative methods utilizing the positions of anchors within a home environment, yet struggles with precision due to inherent inaccuracies in these anchor positions. In response, we introduce a cutting-edge smartphone-based localization system for IoT devices, leveraging the precise positioning capabilities of smartphones equipped with motion sensors. Our system employs artificial intelligence (AI) to merge channel state information from proximal trajectory points of a single smartphone, significantly enhancing line of sight (LoS) angle of arrival (AoA) estimation accuracy, particularly under severe multipath conditions. Additionally, we have developed an AI-based anomaly detection algorithm to further increase the reliability of LoSAoA estimation. This algorithm improves measurement reliability by analyzing the correlation between the accuracy of reversed feature reconstruction and the LoS-AoA estimation. Utilizing a straightforward least squares algorithm in conjunction with accurate LoS-AoA estimation and smartphone positional data, our system efficiently identifies IoT device locations. Validated through extensive simulations and experimental tests with a receiving antenna array comprising just two patch antenna elements in the horizontal direction, our methodology has been shown to attain decimeter-level localization accuracy in nearly 90% of cases, demonstrating robust performance even in challenging real-world scenarios. Additionally, our proposed anomaly detection algorithm trained on Wi-Fi data can be directly applied to ultra-wideband, also outperforming the most advanced techniques. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted in IEEE Internet of Things Journal, Early Access, 2024

Journal ref: IEEE Internet of Things Journal, Early Access, 2024

arXiv:2407.02250 [pdf, ps, other]

All-loop heavy-heavy-light-light correlators in $\mathcal{N}=4$ super Yang-Mills theory

Authors: Augustus Brown, Francesco Galvagno, Congkao Wen

Abstract: We study Heavy-Heavy-Light-Light (HHLL) correlators $\langle \mathcal{H} \mathcal{H} \mathcal{O}_2 \mathcal{O}_2 \rangle$ in $\mathcal{N}=4$ super Yang-Mills theory with $SU(N)$ gauge group at generic $N$. The light operator $\mathcal{O}_2$ is the dimension two superconformal primary in the stress tensor multiplet and $\mathcal{H}$ is a general half-BPS superconformal primary operator with dimensi… ▽ More We study Heavy-Heavy-Light-Light (HHLL) correlators $\langle \mathcal{H} \mathcal{H} \mathcal{O}_2 \mathcal{O}_2 \rangle$ in $\mathcal{N}=4$ super Yang-Mills theory with $SU(N)$ gauge group at generic $N$. The light operator $\mathcal{O}_2$ is the dimension two superconformal primary in the stress tensor multiplet and $\mathcal{H}$ is a general half-BPS superconformal primary operator with dimension (or $R$-charge) $Δ_{\mathcal{H}}$. We consider the large-charge 't Hooft limit, where $Δ_{\mathcal{H}} \rightarrow \infty$ with fixed 't Hooft-like coupling $λ:=Δ_{\mathcal{H}}\, g_{_{\rm YM}}^2$. We show that the $L$-loop contribution to the HHLL correlators in the leading large-charge limit is universal for any choice of the heavy operator $\mathcal{H}$, given as $λ^L \sum_{\ell=0}^{L} Φ^{(\ell)} Φ^{(L-\ell)}$ with an $SU(N)$ colour factor coefficient, where $Φ^{(\ell)}$ is the ladder Feynman integral, which is known to all loops. The dependence on the explicit form of the heavy operator lies only in the colour factor coefficients. We determine such colour factors for several classes of heavy operators, and show that the large charge limit leads to minimal powers of $N$. For the special class of "canonical heavy operators", one can even resum the all-loop ladder integrals and determine the correlators at finite $λ$. Furthermore, upon integrating over the spacetime dependence, the resulting integrated HHLL correlators agree with the existing results derived from supersymmetric localisation. Finally, as an application of the all-loop analytic results, we derive exact expressions for the structure constants of two heavy operators and the Konishi operator, finding intriguing connections with the integrated HHLL correlators. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 37 pages + 3 appendices

Report number: QMUL-PH-24-12

arXiv:2406.13455 [pdf, ps, other]

Johnson graphs as slices of a hypercube and an algebra homomorphism from the universal Racah algebra into $U(\mathfrak{sl}_2)$

Authors: Hau-Wen Huang, Chia-Yi Wen

Abstract: From the viewpoint of Johnson graphs as slices of a hypercube, we derive a novel algebra homomorphism $\sharp$ from the universal Racah algebra $\Re$ into $U(\mathfrak{sl}_2)$. We use the Casimir elements of $\Re$ to describe the kernel of $\sharp$. By pulling back via $\sharp$ every $U(\mathfrak{sl}_2)$-module can be viewed as an $\Re$-module. We show that for any finite-dimensional… ▽ More From the viewpoint of Johnson graphs as slices of a hypercube, we derive a novel algebra homomorphism $\sharp$ from the universal Racah algebra $\Re$ into $U(\mathfrak{sl}_2)$. We use the Casimir elements of $\Re$ to describe the kernel of $\sharp$. By pulling back via $\sharp$ every $U(\mathfrak{sl}_2)$-module can be viewed as an $\Re$-module. We show that for any finite-dimensional $U(\mathfrak{sl}_2)$-module $V$, the $\Re$-module $V$ is completely reducible and three generators of $\Re$ act on every irreducible $\Re$-submodule of $V$ as a Leonard triple. In particular, Leonard triples can be constructed in terms of the second dual distance operator of the hypercube $H(D,2)$ and a decomposition of the second distance operator of $H(D,2)$ induced by Johnson graphs. △ Less

Submitted 19 June, 2024; originally announced June 2024.

MSC Class: 05E30; 16G30; 16S30; 33D45

arXiv:2406.11334 [pdf, other]

Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment

Authors: Chao Wen, Jacqueline Staub, Adish Singla

Abstract: Large language and multimodal models have shown remarkable successes on various benchmarks focused on specific skills such as general-purpose programming, natural language understanding, math word problem-solving, and visual question answering. However, it is unclear how well these models perform on tasks that require a combination of these skills. In this paper, we curate a novel program synthesi… ▽ More Large language and multimodal models have shown remarkable successes on various benchmarks focused on specific skills such as general-purpose programming, natural language understanding, math word problem-solving, and visual question answering. However, it is unclear how well these models perform on tasks that require a combination of these skills. In this paper, we curate a novel program synthesis benchmark based on the XLogoOnline visual programming environment. The benchmark comprises 85 real-world tasks from the Mini-level of the XLogoOnline environment, each requiring a combination of different skills such as spatial planning, basic programming, and logical reasoning. Our evaluation shows that current state-of-the-art models like GPT-4V and Llama3-70B struggle to solve these tasks, achieving only 20% and 2.35% success rates. Next, we develop a fine-tuning pipeline to boost the performance of models by leveraging a large-scale synthetic training dataset with over 80000 tasks. Moreover, we showcase how emulator-driven feedback can be used to design a curriculum over training data distribution. We showcase that a fine-tuned Llama3-8B drastically outperforms GPT-4V and Llama3-70B models, and provide an in-depth analysis of the models' expertise across different skill dimensions. We will publicly release the benchmark for future research on program synthesis in visual programming. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.06132 [pdf, other]

Instantaneous optical singularities and duality-protected dark directions

Authors: Chunchao Wen, Jianfa Zhang, Chaofan Zhang, Shiqiao Qin, Zhihong Zhu, Wei Liu

Abstract: Electromagnetic waves are described by not only polarization ellipses but also cyclically rotating vectors tracing out them. The corresponding fields are respectively directionless steady line fields and directional instantaneous vector fields. Here we study the seminal topic of electromagnetic scattering from the perspective of instantaneous vector fields and uncover how the global topology of th… ▽ More Electromagnetic waves are described by not only polarization ellipses but also cyclically rotating vectors tracing out them. The corresponding fields are respectively directionless steady line fields and directional instantaneous vector fields. Here we study the seminal topic of electromagnetic scattering from the perspective of instantaneous vector fields and uncover how the global topology of the momentum sphere regulates local distributions of tangent scattered fields. Structurally-stable generic singularities of vector fields move cyclically along lines of linear polarizations and at any instant their index sum has to be the Euler characteristic $χ=2$. This contrasts sharply with steady line fields, of which generic singularities constrained by the Euler characteristic locate on points of circular polarizations. From such unique perspective of instantaneous singularities, we discovered that for circularly-polarized waves scattered by electromagnetic duality-symmetric particles, since linearly-polarized scatterings are prohibited by helicity conservation, there must exist at least one dark direction along which the scattering is strictly zero. Two such dark directions can be tuned to overlap, along which the scattering would remain zero for arbitrary incident polarizations. We have essentially revealed that \textit{polarizations underdescribe vectorial electromagnetic waves and the instantaneous perspective is indispensable}. The complementarity we discover provides broader and deeper insights into not only electromagnetism, but also other branches of wave physics where singularities are generic and ubiquitous. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Wei Liu acknowledges many illuminating correspondences with Sir Michael Berry, whose monumental paper with J. F. Nye on phase singularities was published 50 years ago

arXiv:2405.18291 [pdf, other]

FedSAC: Dynamic Submodel Allocation for Collaborative Fairness in Federated Learning

Authors: Zihui Wang, Zheng Wang, Lingjuan Lyu, Zhaopeng Peng, Zhicheng Yang, Chenglu Wen, Rongshan Yu, Cheng Wang, Xiaoliang Fan

Abstract: Collaborative fairness stands as an essential element in federated learning to encourage client participation by equitably distributing rewards based on individual contributions. Existing methods primarily focus on adjusting gradient allocations among clients to achieve collaborative fairness. However, they frequently overlook crucial factors such as maintaining consistency across local models and… ▽ More Collaborative fairness stands as an essential element in federated learning to encourage client participation by equitably distributing rewards based on individual contributions. Existing methods primarily focus on adjusting gradient allocations among clients to achieve collaborative fairness. However, they frequently overlook crucial factors such as maintaining consistency across local models and catering to the diverse requirements of high-contributing clients. This oversight inevitably decreases both fairness and model accuracy in practice. To address these issues, we propose FedSAC, a novel Federated learning framework with dynamic Submodel Allocation for Collaborative fairness, backed by a theoretical convergence guarantee. First, we present the concept of "bounded collaborative fairness (BCF)", which ensures fairness by tailoring rewards to individual clients based on their contributions. Second, to implement the BCF, we design a submodel allocation module with a theoretical guarantee of fairness. This module incentivizes high-contributing clients with high-performance submodels containing a diverse range of crucial neurons, thereby preserving consistency across local models. Third, we further develop a dynamic aggregation module to adaptively aggregate submodels, ensuring the equitable treatment of low-frequency neurons and consequently enhancing overall model accuracy. Extensive experiments conducted on three public benchmarks demonstrate that FedSAC outperforms all baseline methods in both fairness and model accuracy. We see this work as a significant step towards incentivizing broader client participation in federated learning. The source code is available at https://github.com/wangzihuixmu/FedSAC. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted by KDD'24

arXiv:2405.13403 [pdf, other]

Adaptive Wireless Image Semantic Transmission and Over-The-Air Testing

Authors: Jiarun Ding, Peiwen Jiang, Chao-Kai Wen, Shi Jin

Abstract: Semantic communication has undergone considerable evolution due to the recent rapid development of artificial intelligence (AI), significantly enhancing both communication robustness and efficiency. Despite these advancements, most current semantic communication methods for image transmission pay little attention to the differing importance of objects and backgrounds in images. To address this iss… ▽ More Semantic communication has undergone considerable evolution due to the recent rapid development of artificial intelligence (AI), significantly enhancing both communication robustness and efficiency. Despite these advancements, most current semantic communication methods for image transmission pay little attention to the differing importance of objects and backgrounds in images. To address this issue, we propose a novel scheme named ASCViT-JSCC, which utilizes vision transformers (ViTs) integrated with an orthogonal frequency division multiplexing (OFDM) system. This scheme adaptively allocates bandwidth for objects and backgrounds in images according to the importance order of different parts determined by object detection of you only look once version 5 (YOLOv5) and feature points detection of scale invariant feature transform (SIFT). Furthermore, the proposed scheme adheres to digital modulation standards by incorporating quantization modules. We validate this approach through an over-the-air (OTA) testbed named intelligent communication prototype validation platform (ICP) based on a software-defined radio (SDR) and NVIDIA embedded kits. Our findings from both simulations and practical measurements show that ASCViT-JSCC significantly preserves objects in images and enhances reconstruction quality compared to existing methods. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.02173 [pdf, other]

Task Synthesis for Elementary Visual Programming in XLogoOnline Environment

Authors: Chao Wen, Ahana Ghosh, Jacqueline Staub, Adish Singla

Abstract: In recent years, the XLogoOnline programming platform has gained popularity among novice learners. It integrates the Logo programming language with visual programming, providing a visual interface for learning computing concepts. However, XLogoOnline offers only a limited set of tasks, which are inadequate for learners to master the computing concepts that require sufficient practice. To address t… ▽ More In recent years, the XLogoOnline programming platform has gained popularity among novice learners. It integrates the Logo programming language with visual programming, providing a visual interface for learning computing concepts. However, XLogoOnline offers only a limited set of tasks, which are inadequate for learners to master the computing concepts that require sufficient practice. To address this, we introduce XLogoSyn, a novel technique for synthesizing high-quality tasks for varying difficulty levels. Given a reference task, XLogoSyn can generate practice tasks at varying difficulty levels that cater to the varied needs and abilities of different learners. XLogoSyn achieves this by combining symbolic execution and constraint satisfaction techniques. Our expert study demonstrates the effectiveness of XLogoSyn. We have also deployed synthesized practice tasks into XLogoOnline, highlighting the educational benefits of these synthesized practice tasks. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Accepted as a paper at the AIED'24 conference in the late-breaking results track

arXiv:2404.19216 [pdf, other]

Optimal quantum strategy for locating Unruh channels

Authors: Qianqian Liu, Tonghua Liu, Cuihong Wen, Jieci Wang

Abstract: From the perspective of quantum information theory, the effect of Unruh radiation on a two-level accelerated detector can be modeled as a quantum channel. In this work, we employ the tools of channel-position finding to locate Unruh channels. The signal-idler and idler-free protocols are explored to determine the position of the target Unruh channel within a sequence of background channels. We der… ▽ More From the perspective of quantum information theory, the effect of Unruh radiation on a two-level accelerated detector can be modeled as a quantum channel. In this work, we employ the tools of channel-position finding to locate Unruh channels. The signal-idler and idler-free protocols are explored to determine the position of the target Unruh channel within a sequence of background channels. We derive the fidelity-based bounds for the ultimate error probability of each strategy and obtain the conditions where the signal-idler protocol is superior to the protocol involving idler-free states. It is found that the lower bound of the error probability for the signal-idler scheme exhibits clear advantages in all cases, while the idler-free scheme can only be implemented when the temperature of the two channels is very close and the number of initial states is insufficient. Interestingly, it is shown that the optimal detection protocol relies on the residual correlations shared between the emitted probe state and the retained idler modes. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 15 pages, 6 figures

arXiv:2404.19134 [pdf, other]

Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models

Authors: Siyuan Xiang, Chin Tseng, Congcong Wen, Deshana Desai, Yifeng Kou, Binil Starly, Daniele Panozzo, Chen Feng

Abstract: We introduce the first work on benchmarking and evaluating deep clustering algorithms on large-scale non-categorical 3D CAD models. We first propose a workflow to allow expert mechanical engineers to efficiently annotate 252,648 carefully sampled pairwise CAD model similarities, from a subset of the ABC dataset with 22,968 shapes. Using seven baseline deep clustering methods, we then investigate t… ▽ More We introduce the first work on benchmarking and evaluating deep clustering algorithms on large-scale non-categorical 3D CAD models. We first propose a workflow to allow expert mechanical engineers to efficiently annotate 252,648 carefully sampled pairwise CAD model similarities, from a subset of the ABC dataset with 22,968 shapes. Using seven baseline deep clustering methods, we then investigate the fundamental challenges of evaluating clustering methods for non-categorical data. Based on these challenges, we propose a novel and viable ensemble-based clustering comparison approach. This work is the first to directly target the underexplored area of deep clustering algorithms for 3D shapes, and we believe it will be an important building block to analyze and utilize the massive 3D shape collections that are starting to appear in deep geometric computing. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.16493 [pdf, other]

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

Authors: Hai Wu, Shijia Zhao, Xun Huang, Chenglu Wen, Xin Li, Cheng Wang

Abstract: The prevalent approaches of unsupervised 3D object detection follow cluster-based pseudo-label generation and iterative self-training processes. However, the challenge arises due to the sparsity of LiDAR scans, which leads to pseudo-labels with erroneous size and position, resulting in subpar detection performance. To tackle this problem, this paper introduces a Commonsense Prototype-based Detecto… ▽ More The prevalent approaches of unsupervised 3D object detection follow cluster-based pseudo-label generation and iterative self-training processes. However, the challenge arises due to the sparsity of LiDAR scans, which leads to pseudo-labels with erroneous size and position, resulting in subpar detection performance. To tackle this problem, this paper introduces a Commonsense Prototype-based Detector, termed CPD, for unsupervised 3D object detection. CPD first constructs Commonsense Prototype (CProto) characterized by high-quality bounding box and dense points, based on commonsense intuition. Subsequently, CPD refines the low-quality pseudo-labels by leveraging the size prior from CProto. Furthermore, CPD enhances the detection accuracy of sparsely scanned objects by the geometric knowledge from CProto. CPD outperforms state-of-the-art unsupervised 3D detectors on Waymo Open Dataset (WOD), PandaSet, and KITTI datasets by a large margin. Besides, by training CPD on WOD and testing on KITTI, CPD attains 90.85% and 81.01% 3D Average Precision on easy and moderate car classes, respectively. These achievements position CPD in close proximity to fully supervised detectors, highlighting the significance of our method. The code will be available at https://github.com/hailanyi/CPD. △ Less

Submitted 26 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.16412 [pdf, ps, other]

Distributed Matrix Pencil Formulations for Prescribed-Time Leader-Following Consensus of MASs with Unknown Sensor Sensitivity

Authors: Hefu Ye, Changyun Wen, Yongduan Song

Abstract: In this paper, we address the problem of prescribed-time leader-following consensus of heterogeneous multi-agent systems (MASs) in the presence of unknown sensor sensitivity. Under a connected undirected topology, we propose a time-varying dual observer/controller design framework that makes use of regular local and inaccurate feedback to achieve consensus tracking within a prescribed time. In par… ▽ More In this paper, we address the problem of prescribed-time leader-following consensus of heterogeneous multi-agent systems (MASs) in the presence of unknown sensor sensitivity. Under a connected undirected topology, we propose a time-varying dual observer/controller design framework that makes use of regular local and inaccurate feedback to achieve consensus tracking within a prescribed time. In particular, the developed analysis framework is applicable to MASs equipped with sensors of different sensitivities. One of the design innovations involves constructing a distributed matrix pencil formulation based on worst-case sensors, yielding control parameters with sufficient robustness yet relatively low conservatism. Another novelty is the construction of the control gains, which consists of the product of a proportional coefficient obtained from the matrix pencil formulation and a classic time-varying function that grows to infinity or a novel bounded time-varying function. Furthermore, it is possible to extend the prescribed-time distributed protocol to infinite time domain by introducing the bounded time-varying gain technique without sacrificing the ultimate control accuracy, and the corresponding technical proof is comprehensive. The effectiveness of the method is demonstrated through a group of 5 single-link robot manipulators. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 10 pages, 1 figure

arXiv:2404.15131 [pdf, other]

Optimizing Multi-Touch Textile and Tactile Skin Sensing Through Circuit Parameter Estimation

Authors: Bo Ying Su, Yuchen Wu, Chengtao Wen, Changliu Liu

Abstract: Tactile and textile skin technologies have become increasingly important for enhancing human-robot interaction and allowing robots to adapt to different environments. Despite notable advancements, there are ongoing challenges in skin signal processing, particularly in achieving both accuracy and speed in dynamic touch sensing. This paper introduces a new framework that poses the touch sensing prob… ▽ More Tactile and textile skin technologies have become increasingly important for enhancing human-robot interaction and allowing robots to adapt to different environments. Despite notable advancements, there are ongoing challenges in skin signal processing, particularly in achieving both accuracy and speed in dynamic touch sensing. This paper introduces a new framework that poses the touch sensing problem as an estimation problem of resistive sensory arrays. Utilizing a Regularized Least Squares objective function which estimates the resistance distribution of the skin. We enhance the touch sensing accuracy and mitigate the ghosting effects, where false or misleading touches may be registered. Furthermore, our study presents a streamlined skin design that simplifies manufacturing processes without sacrificing performance. Experimental outcomes substantiate the effectiveness of our method, showing 26.9% improvement in multi-touch force-sensing accuracy for the tactile skin. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.12708 [pdf, ps, other]

Magnetic-field driven evolution of zero-energy mode on Bi islands deposited on Fe(Te,Se)

Authors: Kailun Chen, Chuanhao Wen, Zhiyong Hou, Huan Yang, Hai-Hu Wen

Abstract: We investigate the magnetic-field dependent evolution of the zero-bias conductance peaks (ZBCPs) on the nanoscale bismuth islands grown on the FeTe$_{0.55}$Se$_{0.45}$ substrate. The ZBCPs can be observed throughout the entire region on these islands, and their characteristics align with the signatures of Majorana zero modes. Remarkably, the evolution of ZBCPs on these islands exhibits anomalous b… ▽ More We investigate the magnetic-field dependent evolution of the zero-bias conductance peaks (ZBCPs) on the nanoscale bismuth islands grown on the FeTe$_{0.55}$Se$_{0.45}$ substrate. The ZBCPs can be observed throughout the entire region on these islands, and their characteristics align with the signatures of Majorana zero modes. Remarkably, the evolution of ZBCPs on these islands exhibits anomalous behavior under varying magnetic fields: The magnitude of ZBCPs is first enhanced at weak fields lower than 2 T and then suppressed as the fields further increase. We attribute the non-monotonic evolution of the ZBCPs to the magnetic-field-enhanced topological edge states on these Bi islands. Our findings provide valuable insights into the probable origin of the Majorana zero modes in the Bi-island platform and the magnetic-field response of topological edge states. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 6 pages, 4 figures

arXiv:2404.11941 [pdf, other]

Semantic Satellite Communications Based on Generative Foundation Model

Authors: Peiwen Jiang, Chao-Kai Wen, Xiao Li, Shi Jin, Geoffrey Ye Li

Abstract: Satellite communications can provide massive connections and seamless coverage, but they also face several challenges, such as rain attenuation, long propagation delays, and co-channel interference. To improve transmission efficiency and address severe scenarios, semantic communication has become a popular choice, particularly when equipped with foundation models (FMs). In this study, we introduce… ▽ More Satellite communications can provide massive connections and seamless coverage, but they also face several challenges, such as rain attenuation, long propagation delays, and co-channel interference. To improve transmission efficiency and address severe scenarios, semantic communication has become a popular choice, particularly when equipped with foundation models (FMs). In this study, we introduce an FM-based semantic satellite communication framework, termed FMSAT. This framework leverages FM-based segmentation and reconstruction to significantly reduce bandwidth requirements and accurately recover semantic features under high noise and interference. Considering the high speed of satellites, an adaptive encoder-decoder is proposed to protect important features and avoid frequent retransmissions. Meanwhile, a well-received image can provide a reference for repairing damaged images under sudden attenuation. Since acknowledgment feedback is subject to long propagation delays when retransmission is unavoidable, a novel error detection method is proposed to roughly detect semantic errors at the regenerative satellite. With the proposed detectors at both the satellite and the gateway, the quality of the received images can be ensured. The simulation results demonstrate that the proposed method can significantly reduce bandwidth requirements, adapt to complex satellite scenarios, and protect semantic information with an acceptable transmission delay. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2404.11536 [pdf, other]

FedPFT: Federated Proxy Fine-Tuning of Foundation Models

Authors: Zhaopeng Peng, Xiaoliang Fan, Yufan Chen, Zheng Wang, Shirui Pan, Chenglu Wen, Ruisheng Zhang, Cheng Wang

Abstract: Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. In this paper, we propose Federated Proxy Fine-Tuni… ▽ More Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. In this paper, we propose Federated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules. First, the sub-FM construction module employs a layer-wise compression approach, facilitating comprehensive FM fine-tuning across all layers by emphasizing those crucial neurons. Second, the sub-FM alignment module conducts a two-step distillations-layer-level and neuron-level-before and during FL fine-tuning respectively, to reduce error of gradient by accurately aligning sub-FM with FM under theoretical guarantees. Experimental results on seven commonly used datasets (i.e., four text and three vision) demonstrate the superiority of FedPFT. △ Less

Submitted 28 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI'24

arXiv:2404.11400 [pdf, other]

Understanding the instability-wave selectivity of hypersonic compression ramp laminar flow

Authors: Peixu Guo, Jiaao Hao, Chih-Yung Wen

Abstract: The hypersonic flow stability over a two-dimensional compression corner is studied using resolvent analysis, linear stability theory (LST) and parabolised stability equation (PSE) analysis. We find that the interaction between upstream convective-type disturbances and the laminar separation bubble can be divided into two regimes, whose behaviour can be well explained by comparative research. First… ▽ More The hypersonic flow stability over a two-dimensional compression corner is studied using resolvent analysis, linear stability theory (LST) and parabolised stability equation (PSE) analysis. We find that the interaction between upstream convective-type disturbances and the laminar separation bubble can be divided into two regimes, whose behaviour can be well explained by comparative research. First, two-dimensional (2-D) high-frequency Mack modes neutrally oscillate with the presence of alternating stable and unstable regions inside the separation bubble. These discontinuous unstable regions are generated by repeated synchronisations between discrete modes with evolving branches. Through a modal sychronisation analysis, we report that the second modes upstream and downstream of the separation bubble can be essentially different from each other, since they originate from different branches of discrete modes due to flow separation. Second, the 2-D low-frequency `shear-layer mode' is found to be stable in the separation bubble by LST, whereas multiple unstable three-dimensional (3-D) eigenmodes are identified by LST. In general, three significant modes are dominant successively near the separation point, in the separation bubble and near the reattachment point. These modes are found to be sensitive to the streamline curvature effect. The locally dominant modes agree with the resolvent response in terms of the disturbance shape and the growth rate of energy. Thus, a combination of global and local analyses demonstrates that the separation bubble tends to selectively amplify low-frequency 3-D disturbances and `freeze' high-frequency Mack-mode disturbances in an explainable manner. These findings facilitate the understanding of the early evolution of low- and high-frequency instabilities in hypersonic separated flows. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 17 pages, 20 figures

arXiv:2404.04783 [pdf, other]

Fourier Transform-based Wavenumber Domain 3D Imaging in RIS-aided Communication Systems

Authors: Yixuan Huang, Jie Yang, Wankai Tang, Chao-Kai Wen, Shi Jin

Abstract: Radio imaging is rapidly gaining prominence in the design of future communication systems, with the potential to utilize reconfigurable intelligent surfaces (RISs) as imaging apertures. Although the sparsity of targets in three-dimensional (3D) space has led most research to adopt compressed sensing (CS)-based imaging algorithms, these often require substantial computational and memory burdens. Dr… ▽ More Radio imaging is rapidly gaining prominence in the design of future communication systems, with the potential to utilize reconfigurable intelligent surfaces (RISs) as imaging apertures. Although the sparsity of targets in three-dimensional (3D) space has led most research to adopt compressed sensing (CS)-based imaging algorithms, these often require substantial computational and memory burdens. Drawing inspiration from conventional Fourier transform (FT)-based imaging methods, our research seeks to accelerate radio imaging in RIS-aided communication systems. To begin, we introduce a two-stage wavenumber domain 3D imaging technique: first, we modify RIS phase shifts to recover the equivalent channel response from the user equipment to the RIS array, subsequently employing traditional FT-based wavenumber domain methods to produce target images. We also determine the diffraction resolution limits of the system through k-space analysis, taking into account factors including system bandwidth, transmission direction, operating frequency, and the angle subtended by the RIS. Addressing the challenge of limited pilots in communication systems, we unveil an innovative algorithm that merges the strengths of both FT- and CS-based techniques by substituting the expansive sensing matrix with FT-based operators. Our simulation outcomes confirm that our proposed FT-based methods achieve high-quality images while demanding few time, memory, and communication resources. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 16 pages, 11 figures, submitted to IEEE for possible publication

arXiv:2404.00795 [pdf, other]

Towards Practical Requirement Analysis and Verification: A Case Study on Software IP Components in Aerospace Embedded Systems

Authors: Zhi Ma, Cheng Wen, Jie Su, Ming Zhao, Bin Yu, Xu Lu, Cong Tian

Abstract: IP-based software design is a crucial research field that aims to improve efficiency and reliability by reusing complex software components known as intellectual property (IP) components. To ensure the reusability of these components, particularly in security-sensitive software systems, it is necessary to analyze the requirements and perform formal verification for each IP component. However, conv… ▽ More IP-based software design is a crucial research field that aims to improve efficiency and reliability by reusing complex software components known as intellectual property (IP) components. To ensure the reusability of these components, particularly in security-sensitive software systems, it is necessary to analyze the requirements and perform formal verification for each IP component. However, converting the requirements of IP components from natural language descriptions to temporal logic and subsequently conducting formal verification demands domain expertise and non-trivial manpower. This paper presents a case study on software IP components derived from aerospace embedded systems, with the objective of automating the requirement analysis and verification process. The study begins by employing Large Language Models to convert unstructured natural language into formal specifications. Subsequently, three distinct verification techniques are employed to ascertain whether the source code meets the extracted temporal logic properties. By doing so, five real-world IP components from the China Academy of Space Technology (CAST) have been successfully verified. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.00762 [pdf, other]

Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification

Authors: Cheng Wen, Jialun Cao, Jie Su, Zhiwu Xu, Shengchao Qin, Mengda He, Haokun Li, Shing-Chi Cheung, Cong Tian

Abstract: Formal verification provides a rigorous and systematic approach to ensure the correctness and reliability of software systems. Yet, constructing specifications for the full proof relies on domain expertise and non-trivial manpower. In view of such needs, an automated approach for specification synthesis is desired. While existing automated approaches are limited in their versatility, i.e., they ei… ▽ More Formal verification provides a rigorous and systematic approach to ensure the correctness and reliability of software systems. Yet, constructing specifications for the full proof relies on domain expertise and non-trivial manpower. In view of such needs, an automated approach for specification synthesis is desired. While existing automated approaches are limited in their versatility, i.e., they either focus only on synthesizing loop invariants for numerical programs, or are tailored for specific types of programs or invariants. Programs involving multiple complicated data types (e.g., arrays, pointers) and code structures (e.g., nested loops, function calls) are often beyond their capabilities. To help bridge this gap, we present AutoSpec, an automated approach to synthesize specifications for automated program verification. It overcomes the shortcomings of existing work in specification versatility, synthesizing satisfiable and adequate specifications for full proof. It is driven by static analysis and program verification, and is empowered by large language models (LLMs). AutoSpec addresses the practical challenges in three ways: (1) driving \name by static analysis and program verification, LLMs serve as generators to generate candidate specifications, (2) programs are decomposed to direct the attention of LLMs, and (3) candidate specifications are validated in each round to avoid error accumulation during the interaction with LLMs. In this way, AutoSpec can incrementally and iteratively generate satisfiable and adequate specifications. The evaluation shows its effectiveness and usefulness, as it outperforms existing works by successfully verifying 79% of programs through automatic specification synthesis, a significant improvement of 1.592x. It can also be successfully applied to verify the programs in a real-world X509-parser project. △ Less

Submitted 2 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.19501 [pdf, other]

RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method

Authors: Ming Yan, Yan Zhang, Shuqiang Cai, Shuqi Fan, Xincheng Lin, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng Wang

Abstract: Comprehensive capturing of human motions requires both accurate captures of complex poses and precise localization of the human within scenes. Most of the HPE datasets and methods primarily rely on RGB, LiDAR, or IMU data. However, solely using these modalities or a combination of them may not be adequate for HPE, particularly for complex and fast movements. For holistic human motion understanding… ▽ More Comprehensive capturing of human motions requires both accurate captures of complex poses and precise localization of the human within scenes. Most of the HPE datasets and methods primarily rely on RGB, LiDAR, or IMU data. However, solely using these modalities or a combination of them may not be adequate for HPE, particularly for complex and fast movements. For holistic human motion understanding, we present RELI11D, a high-quality multimodal human motion dataset involves LiDAR, IMU system, RGB camera, and Event camera. It records the motions of 10 actors performing 5 sports in 7 scenes, including 3.32 hours of synchronized LiDAR point clouds, IMU measurement data, RGB videos and Event steams. Through extensive experiments, we demonstrate that the RELI11D presents considerable challenges and opportunities as it contains many rapid and complex motions that require precise location. To address the challenge of integrating different modalities, we propose LEIR, a multimodal baseline that effectively utilizes LiDAR Point Cloud, Event stream, and RGB through our cross-attention fusion strategy. We show that LEIR exhibits promising results for rapid motions and daily motions and that utilizing the characteristics of multiple modalities can indeed improve HPE performance. Both the dataset and source code will be released publicly to the research community, fostering collaboration and enabling further exploration in this field. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: CVPR2024, Project website: http://www.lidarhumanmotion.net/reli11d/

arXiv:2403.17263 [pdf, other]

Exact results for giant graviton four-point correlators

Authors: Augustus Brown, Francesco Galvagno, Congkao Wen

Abstract: We study the four-point correlator $\langle \mathcal{O}_2 \mathcal{O}_2 \mathcal{D} \mathcal{D} \rangle$ in $\mathcal{N}=4$ super Yang-Mills theory (SYM) with $SU(N)$ gauge group, where $\mathcal{O}_2$ represents the superconformal primary operator with dimension two, while $\mathcal{D}$ denotes a determinant operator of dimension $N$, which is holographically dual to a giant graviton D3-brane ext… ▽ More We study the four-point correlator $\langle \mathcal{O}_2 \mathcal{O}_2 \mathcal{D} \mathcal{D} \rangle$ in $\mathcal{N}=4$ super Yang-Mills theory (SYM) with $SU(N)$ gauge group, where $\mathcal{O}_2$ represents the superconformal primary operator with dimension two, while $\mathcal{D}$ denotes a determinant operator of dimension $N$, which is holographically dual to a giant graviton D3-brane extending along $S^5$. We analyse the integrated correlator associated with this observable, obtained after integrating out the spacetime dependence over a supersymmetric invariant measure. Similarly to other classes of integrated correlators in $\mathcal{N}=4$ SYM, this integrated correlator can be computed through supersymmetric localisation on the four-sphere. Employing matrix-model recursive techniques, we demonstrate that the integrated correlator can be reformulated as an infinite sum of protected three-point functions with known coefficients. This insight allows us to circumvent the complexity associated with the dimension-$N$ determinant operator, significantly streamlining the large-$N$ expansion of the integrated correlator. In the planar limit and beyond, we derive exact results for the integrated correlator valid for all values of the 't Hooft coupling, and investigate the resurgent properties of their strong coupling expansion. Additionally, in the large-$N$ expansion with fixed (complexified) Yang-Mills coupling, we deduce the $SL(2, \mathbb{Z})$ completion of these results in terms of the non-holomorphic Eisenstein series. The proposed modular functions are confirmed by explicit instanton calculations from the matrix model, and agree with expectations from the holographic dual picture of known results in type IIB string theory. △ Less

Submitted 9 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: v2: 49 pages, 2 figures, corrected appendix C and some typos fixed

arXiv:2403.11764 [pdf, other]

RIS-aided Single-frequency 3D Imaging by Exploiting Multi-view Image Correlations

Authors: Yixuan Huang, Jie Yang, Chao-Kai Wen, Shi Jin

Abstract: Retrieving range information in three-dimensional (3D) radio imaging is particularly challenging due to the limited communication bandwidth and pilot resources. To address this issue, we consider a reconfigurable intelligent surface (RIS)-aided uplink communication scenario, generating multiple measurements through RIS phase adjustment. This study successfully realizes 3D single-frequency imaging… ▽ More Retrieving range information in three-dimensional (3D) radio imaging is particularly challenging due to the limited communication bandwidth and pilot resources. To address this issue, we consider a reconfigurable intelligent surface (RIS)-aided uplink communication scenario, generating multiple measurements through RIS phase adjustment. This study successfully realizes 3D single-frequency imaging by exploiting the near-field multi-view image correlations deduced from user mobility. We first highlight the significance of considering anisotropy in multi-view image formation by investigating radar cross-section properties and diffraction resolution limits. We then propose a novel model for joint multi-view 3D imaging that incorporates occlusion effects and anisotropic scattering. These factors lead to slow image support variation and smooth coefficient evolution, which are mathematically modeled as Markov processes. Based on this model, we employ the Expectation Maximization-Turbo-Generalized Approximate Message Passing algorithm for joint multi-view single-frequency 3D imaging with limited measurements. Simulation results reveal the superiority of joint multi-view imaging in terms of enhanced imaging ranges, accuracies, and anisotropy characterization compared to single-view imaging. Combining adjacent observations for joint multi-view imaging enables a reduction in the measurement overhead by 80%. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 16 pages, 12 figures, accepted by IEEE Transactions on Communications

arXiv:2403.04614 [pdf, ps, other]

Kinematic Hopf algebra and BCJ numerators at finite $α'$

Authors: Gang Chen, Laurentiu Rodina, Congkao Wen

Abstract: In this letter, starting from a kinematic Hopf algebra, we first construct a closed-form formula for all Bern-Carrasco-Johansson (BCJ) numerators in Yang-Mills (YM) theory with infinite orders of $α'$ corrections, known as $\rm DF^2+YM$ theory, when coupled to two heavy particles which can be removed through a simple factorization limit. The full $α'$ dependence appears simply in massive physical… ▽ More In this letter, starting from a kinematic Hopf algebra, we first construct a closed-form formula for all Bern-Carrasco-Johansson (BCJ) numerators in Yang-Mills (YM) theory with infinite orders of $α'$ corrections, known as $\rm DF^2+YM$ theory, when coupled to two heavy particles which can be removed through a simple factorization limit. The full $α'$ dependence appears simply in massive physical propagator factors, with factorization strongly constraining the construction. The intricate structure induced by the massive poles also naturally leads us to find a novel closed-form and local expression for BCJ numerators in usual pure YM theory, based directly on the kinematic Hopf algebra. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 7+5 pages, 0 figures

Report number: QMUL-PH-24-04

arXiv:2403.01949 [pdf]

doi 10.1021/acs.nanolett.4c01581

Spatially Dependent in-Gap States Induced by Andreev Tunneling through a Single Electronic State

Authors: Ruixia Zhong, Zhongzheng Yang, Qi Wang, Fanbang Zheng, Wenhui Li, Juefei Wu, Chenhaoping Wen, Xi Chen, Yanpeng Qi, Shichao Yan

Abstract: By using low-temperature scanning tunneling microscopy and spectroscopy (STM/STS), we observe in-gap states induced by Andreev tunneling through a single impurity state in a low carrier density superconductor (NaAlSi). The energy-symmetric in-gap states appear when the impurity state is located within the superconducting gap. In-gap states can cross the Fermi level, and they show X-shaped spatial… ▽ More By using low-temperature scanning tunneling microscopy and spectroscopy (STM/STS), we observe in-gap states induced by Andreev tunneling through a single impurity state in a low carrier density superconductor (NaAlSi). The energy-symmetric in-gap states appear when the impurity state is located within the superconducting gap. In-gap states can cross the Fermi level, and they show X-shaped spatial variation. We interpret the in-gap states as a consequence of the Andreev tunneling through the impurity state, which involves the formation or breakup of a Cooper pair. Due to the low carrier density in NaAlSi, the in-gap state is tunable by controlling the STM tip-sample distance. Under strong external magnetic fields, the impurity state shows Zeeman splitting when it is located near the Fermi level. Our findings not only demonstrate the Andreev tunneling involving single electronic state, but also provide new insights for understanding the spatially-dependent in-gap states in low carrier density superconductors. △ Less

Submitted 3 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 12 pages, 4 figures

Journal ref: Nano Letters 24, 8580 (2024)

arXiv:2403.00729 [pdf, other]

Can Transformers Capture Spatial Relations between Objects?

Authors: Chuan Wen, Dinesh Jayaraman, Yang Gao

Abstract: Spatial relationships between objects represent key scene information for humans to understand and interact with the world. To study the capability of current computer vision systems to recognize physically grounded spatial relations, we start by proposing precise relation definitions that permit consistently annotating a benchmark dataset. Despite the apparent simplicity of this task relative to… ▽ More Spatial relationships between objects represent key scene information for humans to understand and interact with the world. To study the capability of current computer vision systems to recognize physically grounded spatial relations, we start by proposing precise relation definitions that permit consistently annotating a benchmark dataset. Despite the apparent simplicity of this task relative to others in the recognition literature, we observe that existing approaches perform poorly on this benchmark. We propose new approaches exploiting the long-range attention capabilities of transformers for this task, and evaluating key design principles. We identify a simple "RelatiViT" architecture and demonstrate that it outperforms all current approaches. To our knowledge, this is the first method to convincingly outperform naive baselines on spatial relation prediction in in-the-wild settings. The code and datasets are available in \url{https://sites.google.com/view/spatial-relation}. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 21 pages, 8 figures, ICLR 2024

arXiv:2402.18969 [pdf, other]

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

Authors: Xiaozheng Zheng, Chao Wen, Zhuo Su, Zeran Xu, Zhaohu Li, Yang Zhao, Zhou Xue

Abstract: In this paper, we delve into the creation of one-shot hand avatars, attaining high-fidelity and drivable hand representations swiftly from a single image. With the burgeoning domains of the digital human, the need for quick and personalized hand avatar creation has become increasingly critical. Existing techniques typically require extensive input data and may prove cumbersome or even impractical… ▽ More In this paper, we delve into the creation of one-shot hand avatars, attaining high-fidelity and drivable hand representations swiftly from a single image. With the burgeoning domains of the digital human, the need for quick and personalized hand avatar creation has become increasingly critical. Existing techniques typically require extensive input data and may prove cumbersome or even impractical in certain scenarios. To enhance accessibility, we present a novel method OHTA (One-shot Hand avaTAr) that enables the creation of detailed hand avatars from merely one image. OHTA tackles the inherent difficulties of this data-limited problem by learning and utilizing data-driven hand priors. Specifically, we design a hand prior model initially employed for 1) learning various hand priors with available data and subsequently for 2) the inversion and fitting of the target identity with prior knowledge. OHTA demonstrates the capability to create high-fidelity hand avatars with consistent animatable quality, solely relying on a single image. Furthermore, we illustrate the versatility of OHTA through diverse applications, encompassing text-to-avatar conversion, hand editing, and identity latent space manipulation. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted to CVPR 2024. Project page: https://zxz267.github.io/OHTA

arXiv:2402.18493 [pdf, other]

Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection

Authors: Xun Huang, Hai Wu, Xin Li, Xiaoliang Fan, Chenglu Wen, Cheng Wang

Abstract: LiDAR-based 3D object detection models have traditionally struggled under rainy conditions due to the degraded and noisy scanning signals. Previous research has attempted to address this by simulating the noise from rain to improve the robustness of detection models. However, significant disparities exist between simulated and actual rain-impacted data points. In this work, we propose a novel rain… ▽ More LiDAR-based 3D object detection models have traditionally struggled under rainy conditions due to the degraded and noisy scanning signals. Previous research has attempted to address this by simulating the noise from rain to improve the robustness of detection models. However, significant disparities exist between simulated and actual rain-impacted data points. In this work, we propose a novel rain simulation method, termed DRET, that unifies Dynamics and Rainy Environment Theory to provide a cost-effective means of expanding the available realistic rain data for 3D detection training. Furthermore, we present a Sunny-to-Rainy Knowledge Distillation (SRKD) approach to enhance 3D detection under rainy conditions. Extensive experiments on the WaymoOpenDataset large-scale dataset show that, when combined with the state-of-the-art DSVT model and other classical 3D detectors, our proposed framework demonstrates significant detection accuracy improvements, without losing efficiency. Remarkably, our framework also improves detection capabilities under sunny conditions, therefore offering a robust solution for 3D detection regardless of whether the weather is rainy or sunny △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: Accepted by AAAI2024

arXiv:2402.09546 [pdf, other]

How Secure Are Large Language Models (LLMs) for Navigation in Urban Environments?

Authors: Congcong Wen, Jiazhao Liang, Shuaihang Yuan, Hao Huang, Yi Fang

Abstract: In the field of robotics and automation, navigation systems based on Large Language Models (LLMs) have recently shown impressive performance. However, the security aspects of these systems have received relatively less attention. This paper pioneers the exploration of vulnerabilities in LLM-based navigation models in urban outdoor environments, a critical area given the technology's widespread app… ▽ More In the field of robotics and automation, navigation systems based on Large Language Models (LLMs) have recently shown impressive performance. However, the security aspects of these systems have received relatively less attention. This paper pioneers the exploration of vulnerabilities in LLM-based navigation models in urban outdoor environments, a critical area given the technology's widespread application in autonomous driving, logistics, and emergency services. Specifically, we introduce a novel Navigational Prompt Suffix (NPS) Attack that manipulates LLM-based navigation models by appending gradient-derived suffixes to the original navigational prompt, leading to incorrect actions. We conducted comprehensive experiments on an LLMs-based navigation model that employs various LLMs for reasoning. Our results, derived from the Touchdown and Map2Seq street-view datasets under both few-shot learning and fine-tuning configurations, demonstrate notable performance declines across three metrics in the face of both white-box and black-box attacks. These results highlight the generalizability and transferability of the NPS Attack, emphasizing the need for enhanced security in LLM-based navigation systems. As an initial countermeasure, we propose the Navigational Prompt Engineering (NPE) Defense strategy, concentrating on navigation-relevant keywords to reduce the impact of adversarial suffixes. While initial findings indicate that this strategy enhances navigational safety, there remains a critical need for the wider research community to develop stronger defense methods to effectively tackle the real-world challenges faced by these systems. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2401.17079 [pdf]

Unprecedentedly large superconducting gap in HgBa$_2$Ca$_2$Cu$_3$O$_{8+δ}$ with the highest $T_c$ at ambient pressure

Authors: Chuanhao Wen, Zhiyong Hou, Alireza Akbari, Kailun Chen, Wenshan Hong, Huan Yang, Ilya Eremin, Yuan Li, Hai-Hu Wen

Abstract: In cuprate superconductors, the highest superconducting transition temperature $T_c$ is possessed by the HgBa$_2$Ca$_2$Cu$_3$O$_{8+δ}$ (Hg-1223) system at ambient pressure, but the reason remains elusive. Here we report the scanning tunneling microscope measurements on the Hg-1223 single crystals with $T_c$ = 134 K. The observed superconducting gaps determined from the tunneling spectra can be cat… ▽ More In cuprate superconductors, the highest superconducting transition temperature $T_c$ is possessed by the HgBa$_2$Ca$_2$Cu$_3$O$_{8+δ}$ (Hg-1223) system at ambient pressure, but the reason remains elusive. Here we report the scanning tunneling microscope measurements on the Hg-1223 single crystals with $T_c$ = 134 K. The observed superconducting gaps determined from the tunneling spectra can be categorized into two groups: the smaller gap $Δ_1$ ranges from about 45 to 70 meV, while the larger gap $Δ_2$ from about 65 to 98 meV. The observed unprecedentedly large gap value gives a straightforward explanation to the highest $T_c$ in the Hg-1223 system. The largest gap observed here is comparable to the magnetic superexchange energy and excludes any possibility of using phonon pictures to interpret the superconductivity. Interestingly, an extremely strong particle-hole asymmetry is observed in associating with a very robust coherence peak at the bias of the larger gap in the hole branch of the Bogoliubov dispersion. We propose that the observed asymmetry results from the interplay of a flat band (van Hove singularity) in the electronic spectrum and the large superconducting gap in the underdoped layer. This could be the main reason for the strong pairing, and significant enhancement of the density of states in the hole branch of the Bogoliubov band yielding strong phase coherence of Cooper pairs. A scenario based on a trilayer model with an interlayer coupling can give a reasonable explanation. Our results provide deep insight into understanding the mechanism of superconductivity in cuprate superconductors. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 24 pages, 4 figures

arXiv:2401.11445 [pdf, other]

Towards Non-Robocentric Dynamic Landing of Quadrotor UAVs

Authors: Li-Yu Lo, Boyang Li, Chih-Yung Wen, Ching-Wei Chang

Abstract: In this work, we propose a dynamic landing solution without the need for onboard exteroceptive sensors and an expensive computation unit, where all localization and control modules are carried out on the ground in a non-inertial frame. Our system starts with a relative state estimator of the aerial robot from the perspective of the landing platform, where the state tracking of the UAV is done thro… ▽ More In this work, we propose a dynamic landing solution without the need for onboard exteroceptive sensors and an expensive computation unit, where all localization and control modules are carried out on the ground in a non-inertial frame. Our system starts with a relative state estimator of the aerial robot from the perspective of the landing platform, where the state tracking of the UAV is done through a set of onboard LED markers and an on-ground camera; the state is expressed geometrically on manifold, and is returned by Iterated Extended Kalman filter (IEKF) algorithm. Subsequently, a motion planning module is developed to guide the landing process, formulating it as a minimum jerk trajectory by applying the differential flatness property. Considering visibility and dynamic constraints, the problem is solved using quadratic programming, and the final motion primitive is expressed through piecewise polynomials. Through a series of experiments, the applicability of this approach is validated by successfully landing 18 cm x 18 cm quadrotor on a 43 cm x 43 cm platform, exhibiting performance comparable to conventional methods. Finally, we provide comprehensive hardware and software details to the research community for future reference. △ Less

Submitted 21 January, 2024; originally announced January 2024.

arXiv:2401.11439 [pdf, other]

General Flow as Foundation Affordance for Scalable Robot Learning

Authors: Chengbo Yuan, Chuan Wen, Tong Zhang, Yang Gao

Abstract: We address the challenge of acquiring real-world manipulation skills with a scalable framework.Inspired by the success of large-scale auto-regressive prediction in Large Language Models (LLMs), we hold the belief that identifying an appropriate prediction target capable of leveraging large-scale datasets is crucial for achieving efficient and universal learning. Therefore, we propose to utilize fl… ▽ More We address the challenge of acquiring real-world manipulation skills with a scalable framework.Inspired by the success of large-scale auto-regressive prediction in Large Language Models (LLMs), we hold the belief that identifying an appropriate prediction target capable of leveraging large-scale datasets is crucial for achieving efficient and universal learning. Therefore, we propose to utilize flow, which represents the future trajectories of 3D points on objects of interest, as an ideal prediction target in robot learning. To exploit scalable data resources, we turn our attention to cross-embodiment datasets. We develop, for the first time, a language-conditioned prediction model directly from large-scale RGBD human video datasets. Our predicted flow offers actionable geometric and physics guidance, thus facilitating stable zero-shot skill transfer in real-world scenarios.We deploy our method with a policy based on closed-loop flow prediction. Remarkably, without any additional training, our method achieves an impressive 81% success rate in human-to-robot skill transfer, covering 18 tasks in 6 scenes. Our framework features the following benefits: (1) scalability: leveraging cross-embodiment data resources; (2) universality: multiple object categories, including rigid, articulated, and soft bodies; (3) stable skill transfer: providing actionable guidance with a small inference domain-gap. These lead to a new pathway towards scalable general robot learning. Data, code, and model weights will be made publicly available. △ Less

Submitted 21 January, 2024; originally announced January 2024.

arXiv:2401.10785 [pdf, ps, other]

Composite learning backstepping control with guaranteed exponential stability and robustness

Authors: Tian Shi, Changyun Wen, Yongping Pan

Abstract: Adaptive backstepping control provides a feasible solution to achieve asymptotic tracking for mismatched uncertain nonlinear systems. However, input-to-state stability depends on high-gain feedback generated by nonlinear damping terms, and closed-loop exponential stability with parameter convergence involves a stringent condition named persistent excitation (PE). This paper proposes a composite le… ▽ More Adaptive backstepping control provides a feasible solution to achieve asymptotic tracking for mismatched uncertain nonlinear systems. However, input-to-state stability depends on high-gain feedback generated by nonlinear damping terms, and closed-loop exponential stability with parameter convergence involves a stringent condition named persistent excitation (PE). This paper proposes a composite learning backstepping control (CLBC) strategy based on modular backstepping and high-order tuners to compensate for the transient process of parameter estimation and achieve closed-loop exponential stability without the nonlinear damping terms and the PE condition. A novel composite learning mechanism that maximizes the staged exciting strength is designed for parameter estimation, such that parameter convergence can be achieved under a condition of interval excitation (IE) or even partial IE that is strictly weaker than PE. An extra prediction error is employed in the adaptive law to ensure the transient performance without nonlinear damping terms. The exponential stability of the closed-loop system is proved rigorously under the partial IE or IE condition. Simulations have demonstrated the effectiveness and superiority of the proposed method in both parameter estimation and control compared to state-of-the-art methods. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.00025 [pdf, other]

Any-point Trajectory Modeling for Policy Learning

Authors: Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel

Abstract: Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the… ▽ More Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the lack of action labels. In this work, we introduce a novel framework, Any-point Trajectory Modeling (ATM), that utilizes video demonstrations by pre-training a trajectory model to predict future trajectories of arbitrary points within a video frame. Once trained, these trajectories provide detailed control guidance, enabling the learning of robust visuomotor policies with minimal action-labeled data. Across over 130 language-conditioned tasks we evaluated in both simulation and the real world, ATM outperforms strong video pre-training baselines by 80% on average. Furthermore, we show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology. Visualizations and code are available at: \url{https://xingyu-lin.github.io/atm}. △ Less

Submitted 12 July, 2024; v1 submitted 28 December, 2023; originally announced January 2024.

Comments: 18 pages, 15 figures

arXiv:2312.15580 [pdf, other]

Magneto-chiral backscatterings by rotationally symmetric nonreciprocal structures

Authors: Chunchao Wen, Jianfa Zhang, Shiqiao Qin, Zhihong Zhu, Wei Liu

Abstract: It was proved that the joint operation of electromagnetic reciprocity and $n$-fold ($n\geq3$) rotational symmetry would secure arbitrary polarization-independent backscattering efficiency [Phys. Rev. B \textbf{103}, 045422 (2021)]. Here we remove the restriction of reciprocity and study the backscatterings of plane waves by rotationally symmetric magneto-optical structures, with collinear incident… ▽ More It was proved that the joint operation of electromagnetic reciprocity and $n$-fold ($n\geq3$) rotational symmetry would secure arbitrary polarization-independent backscattering efficiency [Phys. Rev. B \textbf{103}, 045422 (2021)]. Here we remove the restriction of reciprocity and study the backscatterings of plane waves by rotationally symmetric magneto-optical structures, with collinear incident wavevector, rotational axis and externally applied magnetic field. It is revealed that though nonreciprocity removes the degeneracy of backscattering efficiencies for circularly-polarized incident waves of opposite handedness, the remaining rotational symmetry is sufficient to guarantee that the efficiency is related to the polarization ellipticity only, having nothing to do with the orientations of the polarization ellipses. Moreover, the backscattering efficiency reaches its extremes (maximum or minimum values) always for circularly-polarized incident waves, and for other polarizations the efficiency is their ellipticity-weighted arithmetic average. The principles we have revealed are dictated by rotational symmetries only, which are irrelevant to specific geometric or optical parameters and are intrinsically robust against any rotational-symmetry preserving perturbations. The correlations we have discovered could be further exploited for fundamental explorations in nonreciprocal photonics and practical applications including polarimetry and ellipsometry. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: 6 pages and 4 figures; Comments to welcome

arXiv:2312.14495 [pdf, other]

Beam Foreseeing in Millimeter-Wave Systems with Situational Awareness: Fundamental Limits via Cramér-Rao Lower Bound

Authors: Wan-Ting Shih, Chao-Kai Wen, Shang-Ho Tsai, Shi Jin, Chau Yuen

Abstract: Millimeter-wave (mmWave) networks offer the potential for high-speed data transfer and precise localization, leveraging large antenna arrays and extensive bandwidths. However, these networks are challenged by significant path loss and susceptibility to blockages. In this study, we delve into the use of situational awareness for beam prediction within the 5G NR beam management framework. We introdu… ▽ More Millimeter-wave (mmWave) networks offer the potential for high-speed data transfer and precise localization, leveraging large antenna arrays and extensive bandwidths. However, these networks are challenged by significant path loss and susceptibility to blockages. In this study, we delve into the use of situational awareness for beam prediction within the 5G NR beam management framework. We introduce an analytical framework based on the Cramér-Rao Lower Bound, enabling the quantification of 6D position-related information of geometric reflectors. This includes both 3D locations and 3D orientation biases, facilitating accurate determinations of the beamforming gain achievable by each reflector or candidate beam. This framework empowers us to predict beam alignment performance at any given location in the environment, ensuring uninterrupted wireless access. Our analysis offers critical insights for choosing the most effective beam and antenna module strategies, particularly in scenarios where communication stability is threatened by blockages. Simulation results show that our approach closely approximates the performance of an ideal, Oracle-based solution within the existing 5G NR beam management system. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: 16 pages, 10 figures; IEEE Transactions on Wireless Communications

arXiv:2312.14453 [pdf, other]

Hybrid Aerodynamics-Based Model Predictive Control for a Tail-Sitter UAV

Authors: Bailun Jiang, Boyang Li, Ching-Wei Chang, Chih-Yung Wen

Abstract: It is challenging to model and control a tail-sitter unmanned aerial vehicle (UAV) because its blended wing body generates complicated nonlinear aerodynamic effects, such as wing lift, fuselage drag, and propeller-wing interactions. We therefore devised a hybrid aerodynamic modeling method and model predictive control (MPC) design for a quadrotor tail-sitter UAV. The hybrid model consists of the N… ▽ More It is challenging to model and control a tail-sitter unmanned aerial vehicle (UAV) because its blended wing body generates complicated nonlinear aerodynamic effects, such as wing lift, fuselage drag, and propeller-wing interactions. We therefore devised a hybrid aerodynamic modeling method and model predictive control (MPC) design for a quadrotor tail-sitter UAV. The hybrid model consists of the Newton-Euler equation, which describes quadrotor dynamics, and a feedforward neural network, which learns residual aerodynamic effects. This hybrid model exhibits high predictive accuracy at a low computational cost and was used to implement hybrid MPC, which optimizes the throttle, pitch angle, and roll angle for position tracking. The controller performance was validated in real-world experiments, which obtained a 57% tracking error reduction compared with conventional nonlinear MPC. External wind disturbance was also introduced and the experimental results confirmed the robustness of the controller to these conditions. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.08664 [pdf, other]

SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration

Authors: Kezheng Xiong, Maoji Zheng, Qingshan Xu, Chenglu Wen, Siqi Shen, Cheng Wang

Abstract: Point cloud registration, a fundamental task in 3D computer vision, has remained largely unexplored in cross-source point clouds and unstructured scenes. The primary challenges arise from noise, outliers, and variations in scale and density. However, neglected geometric natures of point clouds restricts the performance of current methods. In this paper, we propose a novel method termed SPEAL to le… ▽ More Point cloud registration, a fundamental task in 3D computer vision, has remained largely unexplored in cross-source point clouds and unstructured scenes. The primary challenges arise from noise, outliers, and variations in scale and density. However, neglected geometric natures of point clouds restricts the performance of current methods. In this paper, we propose a novel method termed SPEAL to leverage skeletal representations for effective learning of intrinsic topologies of point clouds, facilitating robust capture of geometric intricacy. Specifically, we design the Skeleton Extraction Module to extract skeleton points and skeletal features in an unsupervised manner, which is inherently robust to noise and density variances. Then, we propose the Skeleton-Aware GeoTransformer to encode high-level skeleton-aware features. It explicitly captures the topological natures and inter-point-cloud skeletal correlations with the noise-robust and density-invariant skeletal representations. Next, we introduce the Correspondence Dual-Sampler to facilitate correspondences by augmenting the correspondence set with skeletal correspondences. Furthermore, we construct a challenging novel large-scale cross-source point cloud dataset named KITTI CrossSource for benchmarking cross-source point cloud registration methods. Extensive quantitative and qualitative experiments are conducted to demonstrate our approach's superiority and robustness on both cross-source and same-source datasets. To the best of our knowledge, our approach is the first to facilitate point cloud registration with skeletal geometric priors. △ Less

Submitted 3 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI2024

arXiv:2312.08591 [pdf, other]

Joint2Human: High-quality 3D Human Generation via Compact Spherical Embedding of 3D Joints

Authors: Muxin Zhang, Qiao Feng, Zhuo Su, Chao Wen, Zhou Xue, Kun Li

Abstract: 3D human generation is increasingly significant in various applications. However, the direct use of 2D generative methods in 3D generation often results in losing local details, while methods that reconstruct geometry from generated images struggle with global view consistency. In this work, we introduce Joint2Human, a novel method that leverages 2D diffusion models to generate detailed 3D human g… ▽ More 3D human generation is increasingly significant in various applications. However, the direct use of 2D generative methods in 3D generation often results in losing local details, while methods that reconstruct geometry from generated images struggle with global view consistency. In this work, we introduce Joint2Human, a novel method that leverages 2D diffusion models to generate detailed 3D human geometry directly, ensuring both global structure and local details. To achieve this, we employ the Fourier occupancy field (FOF) representation, enabling the direct generation of 3D shapes as preliminary results with 2D generative models. With the proposed high-frequency enhancer and the multi-view recarving strategy, our method can seamlessly integrate the details from different views into a uniform global shape. To better utilize the 3D human prior and enhance control over the generated geometry, we introduce a compact spherical embedding of 3D joints. This allows for an effective guidance of pose during the generation process. Additionally, our method can generate 3D humans guided by textual inputs. Our experimental results demonstrate the capability of our method to ensure global structure, local details, high resolution, and low computational cost simultaneously. More results and the code can be found on our project page at http://cic.tju.edu.cn/faculty/likun/projects/Joint2Human. △ Less

Submitted 6 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2311.15950 [pdf, other]

Auto-CsiNet: Scenario-customized Automatic Neural Network Architecture Generation for Massive MIMO CSI Feedback

Authors: Xiangyi Li, Jiajia Guo, Chao-Kai Wen, Shi Jin

Abstract: Deep learning has revolutionized the design of the channel state information (CSI) feedback module in wireless communications. However, designing the optimal neural network (NN) architecture for CSI feedback can be a laborious and time-consuming process. Manual design can be prohibitively expensive for customizing NNs to different scenarios. This paper proposes using neural architecture search (NA… ▽ More Deep learning has revolutionized the design of the channel state information (CSI) feedback module in wireless communications. However, designing the optimal neural network (NN) architecture for CSI feedback can be a laborious and time-consuming process. Manual design can be prohibitively expensive for customizing NNs to different scenarios. This paper proposes using neural architecture search (NAS) to automate the generation of scenario-customized CSI feedback NN architectures, thereby maximizing the potential of deep learning in exclusive environments. By employing automated machine learning and gradient-descent-based NAS, an efficient and cost-effective architecture design process is achieved. The proposed approach leverages implicit scene knowledge, integrating it into the scenario customization process in a data-driven manner, and fully exploits the potential of deep learning for each specific scenario. To address the issue of excessive search, early stopping and elastic selection mechanisms are employed, enhancing the efficiency of the proposed scheme. The experimental results demonstrate that the automatically generated architecture, known as Auto-CsiNet, outperforms manually-designed models in both reconstruction performance (achieving approximately a 14% improvement) and complexity (reducing it by approximately 50%). Furthermore, the paper analyzes the impact of the scenario on the NN architecture and its capacity. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 16 pages, 10 figures, 6 tables

arXiv:2311.15313 [pdf, ps, other]

Low-Complexity Joint Beamforming for RIS-Assisted MU-MISO Systems Based on Model-Driven Deep Learning

Authors: Weijie Jin, Jing Zhang, Chao-Kai Wen, Shi Jin, Xiao Li, Shuangfeng Han

Abstract: Reconfigurable intelligent surfaces (RIS) can improve signal propagation environments by adjusting the phase of the incident signal. However, optimizing the phase shifts jointly with the beamforming vector at the access point is challenging due to the non-convex objective function and constraints. In this study, we propose an algorithm based on weighted minimum mean square error optimization and p… ▽ More Reconfigurable intelligent surfaces (RIS) can improve signal propagation environments by adjusting the phase of the incident signal. However, optimizing the phase shifts jointly with the beamforming vector at the access point is challenging due to the non-convex objective function and constraints. In this study, we propose an algorithm based on weighted minimum mean square error optimization and power iteration to maximize the weighted sum rate (WSR) of a RIS-assisted downlink multi-user multiple-input single-output system. To further improve performance, a model-driven deep learning (DL) approach is designed, where trainable variables and graph neural networks are introduced to accelerate the convergence of the proposed algorithm. We also extend the proposed method to include beamforming with imperfect channel state information and derive a two-timescale stochastic optimization algorithm. Simulation results show that the proposed algorithm outperforms state-of-the-art algorithms in terms of complexity and WSR. Specifically, the model-driven DL approach has a runtime that is approximately 3% of the state-of-the-art algorithm to achieve the same performance. Additionally, the proposed algorithm with 2-bit phase shifters outperforms the compared algorithm with continuous phase shift. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 14 pages, 9 figures, 2 tables. This paper has been accepted for publication by the IEEE Transactions on Wireless Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2311.11594 [pdf, ps, other]

Joint Design of ISAC Waveform under PAPR Constraints

Authors: Yating Chen, Cai Wen, Yan Huang, Le Liang, Jie Li, Hui Zhang, Wei Hong

Abstract: In this paper, we formulate the precoding problem of integrated sensing and communication (ISAC) waveform as a non-convex quadratically constrainted quadratic program (QCQP), in which the weighted sum of communication multi-user interference (MUI) and the gap between dual-use waveform and ideal radar waveform is minimized with peak-to-average power ratio (PAPR) constraints. We propose an efficient… ▽ More In this paper, we formulate the precoding problem of integrated sensing and communication (ISAC) waveform as a non-convex quadratically constrainted quadratic program (QCQP), in which the weighted sum of communication multi-user interference (MUI) and the gap between dual-use waveform and ideal radar waveform is minimized with peak-to-average power ratio (PAPR) constraints. We propose an efficient algorithm based on alternating direction method of multipliers (ADMM), which is able to decouple multiple variables and provide a closed-form solution for each subproblem. In addition, to improve the sensing performance in both spatial and temporal domains, we propose a new criteria to design the ideal radar waveform, in which the beam pattern is made similar to the ideal one and the integrated sidelobe level of the ambiguity function in each target direction is minimized in the region of interest. The limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm is applied to the design of the ideal radar waveform which works as a reference in the design of the dual-function waveform. Numerical results indicate that the designed dual-function waveform is capable of offering good communication quality of service (QoS) and sensing performance. △ Less

Submitted 11 February, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.07423 [pdf]

Superconductivity in trilayer nickelate La4Ni3O10 under pressure

Authors: Mingxin Zhang, Cuiying Pei, Xian Du, Weixiong Hu, Yantao Cao, Qi Wang, Juefei Wu, Yidian Li, Huanyu Liu, Chenhaoping Wen, Yi Zhao, Changhua Li, Weizheng Cao, Shihao Zhu, Qing Zhang, Na Yu, Peihong Cheng, Lili Zhang, Zhiwei Li, Jinkui Zhao, Yulin Chen, Hanjie Guo, Congjun Wu, Fan Yang, Shichao Yan , et al. (2 additional authors not shown)

Abstract: Nickelate superconductors have attracted a great deal of attention over the past few decades due to their similar crystal and electronic structures with high-temperature cuprate superconductors. Here, we report the superconductivity in a pressurized Ruddlesden-Popper phase single crystal, La4Ni3O10 (n = 3), and its interplay with the density wave order in the phase diagram. With increasing pressur… ▽ More Nickelate superconductors have attracted a great deal of attention over the past few decades due to their similar crystal and electronic structures with high-temperature cuprate superconductors. Here, we report the superconductivity in a pressurized Ruddlesden-Popper phase single crystal, La4Ni3O10 (n = 3), and its interplay with the density wave order in the phase diagram. With increasing pressure, the density wave order as indicated by the anomaly in the resistivity is progressively suppressed, followed by the emergence of the superconductivity around 25 K. Our angle-resolved photoemission spectroscopy measurements reveal that the electronic structure of La4Ni3O10 is very similar to that of La3Ni2O7, suggesting unified electronic properties of nickelates in Ruddlesden-Popper phases. Moreover, theoretical analysis unveils that antiferromagnetic (AFM) super-exchange interactions can serve as the effective pairing interaction for the emergence of superconductivity (SC) in pressurized La4Ni3O10. Our research provides a new platform for the investigation of the unconventional superconductivity mechanism in Ruddlesden-Popper trilayer perovskite nickelates. △ Less

Submitted 12 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: 21 pages, 4 figures

arXiv:2311.07047 [pdf, other]

Quantum properties of fermionic fields in multi-event horizon spacetime

Authors: Qianqian Liu, Shu-Min Wu, Cuihong Wen, Jieci Wang

Abstract: We investigate the properties of quantum entanglement and mutual information in the multi-event horizon Schwarzschild-de Sitter (SdS) spacetime for massless Dirac fields. We obtain the expression for the evolutions of the quantum state near the black hole event horizon (BEH) and cosmological event horizon (CEH) in the SdS spacetime. Under the Nariai limit, the physically accessible entanglement an… ▽ More We investigate the properties of quantum entanglement and mutual information in the multi-event horizon Schwarzschild-de Sitter (SdS) spacetime for massless Dirac fields. We obtain the expression for the evolutions of the quantum state near the black hole event horizon (BEH) and cosmological event horizon (CEH) in the SdS spacetime. Under the Nariai limit, the physically accessible entanglement and mutual information are maximized, and the physically inaccessible correlations are zero. With the increase in temperature of either horizon, the physically accessible correlations experience degradation. Notably, the initial state remains entangled and can be utilized in entanglement-based quantum information processing tasks, which differs form the scalar field case. Furthermore, the degradation of physically accessible correlations is more pronounced for small-mass black holes. In contrast, the physically inaccessible correlations separated by the CEH monotonically increase with the radiation temperature, and such correlations are not decisively influenced by the effect of particle creation at the BEH. Moreover, a similar phenomenon is observed for the inaccessible correlations separated by the BEH. This result differs from the single event spacetime, in which the physically inaccessible entanglement is a monotonic function of the Hawking temperature. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 14 pages, 7 figures

arXiv:2311.06916 [pdf]

TSViT: A Time Series Vision Transformer for Fault Diagnosis

Authors: Shouhua Zhang, Jiehan Zhou, Xue Ma, Chenglin Wen, Susanna Pirttikangas, Chen Yu, Weishan Zhang, Chunsheng Yang

Abstract: Traditional fault diagnosis methods using Convolutional Neural Networks (CNNs) face limitations in capturing temporal features (i.e., the variation of vibration signals over time). To address this issue, this paper introduces a novel model, the Time Series Vision Transformer (TSViT), specifically designed for fault diagnosis. On one hand, TSViT model integrates a convolutional layer to segment vib… ▽ More Traditional fault diagnosis methods using Convolutional Neural Networks (CNNs) face limitations in capturing temporal features (i.e., the variation of vibration signals over time). To address this issue, this paper introduces a novel model, the Time Series Vision Transformer (TSViT), specifically designed for fault diagnosis. On one hand, TSViT model integrates a convolutional layer to segment vibration signals and capture local features. On the other hand, it employs a transformer encoder to learn long-term temporal information. The experimental results with other methods on two distinct datasets validate the effectiveness and generalizability of TSViT with a comparative analysis of its hyperparameters' impact on model performance, computational complexity, and overall parameter quantity. TSViT reaches average accuracies of 100% and 99.99% on two test sets, correspondingly. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Showing 1–50 of 456 results for author: Wen, C