Search | arXiv e-print repository

RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba

Authors: Andong Lu, Wanyu Wang, Chenglong Li, Jin Tang, Bin Luo

Abstract: Existing RGBT tracking methods often design various interaction models to perform cross-modal fusion of each layer, but can not execute the feature interactions among all layers, which plays a critical role in robust multimodal representation, due to large computational burden. To address this issue, this paper presents a novel All-layer multimodal Interaction Network, named AINet, which performs… ▽ More Existing RGBT tracking methods often design various interaction models to perform cross-modal fusion of each layer, but can not execute the feature interactions among all layers, which plays a critical role in robust multimodal representation, due to large computational burden. To address this issue, this paper presents a novel All-layer multimodal Interaction Network, named AINet, which performs efficient and effective feature interactions of all modalities and layers in a progressive fusion Mamba, for robust RGBT tracking. Even though modality features in different layers are known to contain different cues, it is always challenging to build multimodal interactions in each layer due to struggling in balancing interaction capabilities and efficiency. Meanwhile, considering that the feature discrepancy between RGB and thermal modalities reflects their complementary information to some extent, we design a Difference-based Fusion Mamba (DFM) to achieve enhanced fusion of different modalities with linear complexity. When interacting with features from all layers, a huge number of token sequences (3840 tokens in this work) are involved and the computational burden is thus large. To handle this problem, we design an Order-dynamic Fusion Mamba (OFM) to execute efficient and effective feature interactions of all layers by dynamically adjusting the scan order of different layers in Mamba. Extensive experiments on four public RGBT tracking datasets show that AINet achieves leading performance against existing state-of-the-art methods. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.07921 [pdf]

Physics-Informed Neural Network for Predicting Out-of-Training-Range TCAD Solution with Minimized Domain Expertise

Authors: Albert Lu, Yu Foon Chau, Hiu Yung Wong

Abstract: Machine learning (ML) is promising in assisting technology computer-aided design (TCAD) simulations to alleviate difficulty in convergence and prolonged simulation time. While ML is widely used in TCAD, they either require access to the internal solver, require extensive domain expertise, are only trained by terminal quantities such as currents and voltages, and/or lack out-of-training-range predi… ▽ More Machine learning (ML) is promising in assisting technology computer-aided design (TCAD) simulations to alleviate difficulty in convergence and prolonged simulation time. While ML is widely used in TCAD, they either require access to the internal solver, require extensive domain expertise, are only trained by terminal quantities such as currents and voltages, and/or lack out-of-training-range prediction capability. In this paper, using Si nanowire as an example, we demonstrate that it is possible to use a physics-informed neural network (PINN) to predict out-of-training-range TCAD solutions without accessing the internal solver and with minimal domain expertise. The machine not only can predict a 2.5 times larger range than the training but also can predict the inversion region by only being trained with subthreshold region data. The physics-informed module is also trained with data without the need for human-coded equations making this easier to be extended to more sophisticated systems. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.06907 [pdf, other]

An Information Geometry Interpretation for Approximate Message Passing

Authors: Bingyan Liu, An-An Lu, Mingrui Fan, Jiyuan Yang, Xiqi Gao

Abstract: In this paper, we propose an information geometry (IG) framework to solve the standard linear regression problem. The proposed framework is an extension of the one for computing the mean of complex multivariate Gaussian distribution. By applying the proposed framework, the information geometry approach (IGA) and the approximate information geometry approach (AIGA) for basis pursuit de-noising (BPD… ▽ More In this paper, we propose an information geometry (IG) framework to solve the standard linear regression problem. The proposed framework is an extension of the one for computing the mean of complex multivariate Gaussian distribution. By applying the proposed framework, the information geometry approach (IGA) and the approximate information geometry approach (AIGA) for basis pursuit de-noising (BPDN) in standard linear regression are derived. The framework can also be applied to other standard linear regression problems. With the transformations of natural and expectation parameters of Gaussian distributions, we then show the relationship between the IGA and the message passing (MP) algorithm. Finally, we prove that the AIGA is equivalent to the approximate message passing (AMP) algorithm. These intrinsic results offer a new perspective for the AMP algorithm, and clues for understanding and improving stochastic reasoning methods. △ Less

Submitted 13 August, 2024; originally announced August 2024.

Comments: 30 pages, 5 figures

arXiv:2408.04579 [pdf, other]

SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More

Authors: Tianrun Chen, Ankang Lu, Lanyun Zhu, Chaotao Ding, Chunan Yu, Deyi Ji, Zejian Li, Lingyun Sun, Papa Mao, Ying Zang

Abstract: The advent of large models, also known as foundation models, has significantly transformed the AI research landscape, with models like Segment Anything (SAM) achieving notable success in diverse image segmentation scenarios. Despite its advancements, SAM encountered limitations in handling some complex low-level segmentation tasks like camouflaged object and medical imaging. In response, in 2023,… ▽ More The advent of large models, also known as foundation models, has significantly transformed the AI research landscape, with models like Segment Anything (SAM) achieving notable success in diverse image segmentation scenarios. Despite its advancements, SAM encountered limitations in handling some complex low-level segmentation tasks like camouflaged object and medical imaging. In response, in 2023, we introduced SAM-Adapter, which demonstrated improved performance on these challenging tasks. Now, with the release of Segment Anything 2 (SAM2), a successor with enhanced architecture and a larger training corpus, we reassess these challenges. This paper introduces SAM2-Adapter, the first adapter designed to overcome the persistent limitations observed in SAM2 and achieve new state-of-the-art (SOTA) results in specific downstream tasks including medical image segmentation, camouflaged (concealed) object detection, and shadow detection. SAM2-Adapter builds on the SAM-Adapter's strengths, offering enhanced generalizability and composability for diverse applications. We present extensive experimental results demonstrating SAM2-Adapter's effectiveness. We show the potential and encourage the research community to leverage the SAM2 model with our SAM2-Adapter for achieving superior segmentation outcomes. Code, pre-trained models, and data processing protocols are available at http://tianrun-chen.github.io/SAM-Adaptor/ △ Less

Submitted 10 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2304.09148

arXiv:2408.02222 [pdf, other]

Cross-modulated Attention Transformer for RGBT Tracking

Authors: Yun Xiao, Jiacong Zhao, Andong Lu, Chenglong Li, Yin Lin, Bing Yin, Cong Liu

Abstract: Existing Transformer-based RGBT trackers achieve remarkable performance benefits by leveraging self-attention to extract uni-modal features and cross-attention to enhance multi-modal feature interaction and template-search correlation computation. Nevertheless, the independent search-template correlation calculations ignore the consistency between branches, which can result in ambiguous and inappr… ▽ More Existing Transformer-based RGBT trackers achieve remarkable performance benefits by leveraging self-attention to extract uni-modal features and cross-attention to enhance multi-modal feature interaction and template-search correlation computation. Nevertheless, the independent search-template correlation calculations ignore the consistency between branches, which can result in ambiguous and inappropriate correlation weights. It not only limits the intra-modal feature representation, but also harms the robustness of cross-attention for multi-modal feature interaction and search-template correlation computation. To address these issues, we propose a novel approach called Cross-modulated Attention Transformer (CAFormer), which performs intra-modality self-correlation, inter-modality feature interaction, and search-template correlation computation in a unified attention model, for RGBT tracking. In particular, we first independently generate correlation maps for each modality and feed them into the designed Correlation Modulated Enhancement module, modulating inaccurate correlation weights by seeking the consensus between modalities. Such kind of design unifies self-attention and cross-attention schemes, which not only alleviates inaccurate attention weight computation in self-attention but also eliminates redundant computation introduced by extra cross-attention scheme. In addition, we propose a collaborative token elimination strategy to further improve tracking inference efficiency and accuracy. Extensive experiments on five public RGBT tracking benchmarks show the outstanding performance of the proposed CAFormer against state-of-the-art methods. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Comments: 10 pages, 5 figures

arXiv:2407.18175 [pdf, other]

Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

Authors: Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang

Abstract: Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs). However, ViT models are often computation-intensive for efficient deployment on resource-limited edge devices. This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs, to design efficient ViT models for… ▽ More Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs). However, ViT models are often computation-intensive for efficient deployment on resource-limited edge devices. This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs, to design efficient ViT models for hardware implementation while preserving the accuracy. First, Quasar-ViT trains a supernet using our row-wise flexible mixed-precision quantization scheme, mixed-precision weight entanglement, and supernet layer scaling techniques. Then, it applies an efficient hardware-oriented search algorithm, integrated with hardware latency and resource modeling, to determine a series of optimal subnets from supernet under different inference latency targets. Finally, we propose a series of model-adaptive designs on the FPGA platform to support the architecture search and mitigate the gap between the theoretical computation reduction and the practical inference speedup. Our searched models achieve 101.5, 159.6, and 251.6 frames-per-second (FPS) inference speed on the AMD/Xilinx ZCU102 FPGA with 80.4%, 78.6%, and 74.9% top-1 accuracy, respectively, for the ImageNet dataset, consistently outperforming prior works. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: Accepted by ICS 2024

arXiv:2407.12322 [pdf, other]

Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer

Authors: Wenhan Wu, Ce Zheng, Zihao Yang, Chen Chen, Srijan Das, Aidong Lu

Abstract: Recently, transformers have demonstrated great potential for modeling long-term dependencies from skeleton sequences and thereby gained ever-increasing attention in skeleton action recognition. However, the existing transformer-based approaches heavily rely on the naive attention mechanism for capturing the spatiotemporal features, which falls short in learning discriminative representations that… ▽ More Recently, transformers have demonstrated great potential for modeling long-term dependencies from skeleton sequences and thereby gained ever-increasing attention in skeleton action recognition. However, the existing transformer-based approaches heavily rely on the naive attention mechanism for capturing the spatiotemporal features, which falls short in learning discriminative representations that exhibit similar motion patterns. To address this challenge, we introduce the Frequency-aware Mixed Transformer (FreqMixFormer), specifically designed for recognizing similar skeletal actions with subtle discriminative motions. First, we introduce a frequency-aware attention module to unweave skeleton frequency representations by embedding joint features into frequency attention maps, aiming to distinguish the discriminative movements based on their frequency coefficients. Subsequently, we develop a mixed transformer architecture to incorporate spatial features with frequency features to model the comprehensive frequency-spatial patterns. Additionally, a temporal transformer is proposed to extract the global correlations across frames. Extensive experiments show that FreqMiXFormer outperforms SOTA on 3 popular skeleton action recognition datasets, including NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets. △ Less

Submitted 29 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

Comments: Accepted by ACM Multimedia 2024

arXiv:2406.19583 [pdf, other]

Interference Cancellation Information Geometry Approach for Massive MIMO Channel Estimation

Authors: An-An Lu, Bingyan Liu, Xiqi Gao

Abstract: In this paper, the interference cancellation information geometry approaches (IC-IGAs) for massive MIMO channel estimation are proposed. The proposed algorithms are low-complexity approximations of the minimum mean square error (MMSE) estimation. To illustrate the proposed algorithms, a unified framework of the information geometry approach for channel estimation and its geometric explanation are… ▽ More In this paper, the interference cancellation information geometry approaches (IC-IGAs) for massive MIMO channel estimation are proposed. The proposed algorithms are low-complexity approximations of the minimum mean square error (MMSE) estimation. To illustrate the proposed algorithms, a unified framework of the information geometry approach for channel estimation and its geometric explanation are described first. Then, a modified form that has the same mean as the MMSE estimation is constructed. Based on this, the IC-IGA algorithm and the interference cancellation simplified information geometry approach (IC-SIGA) are derived by applying the information geometry framework. The a posteriori means on the equilibrium of the proposed algorithms are proved to be equal to the mean of MMSE estimation, and the complexity of the IC-SIGA algorithm in practical massive MIMO systems is further reduced by considering the beam-based statistical channel model (BSCM) and fast Fourier transform (FFT). Simulation results show that the proposed methods achieve similar performance as the existing information geometry approach (IGA) with lower complexity. △ Less

Submitted 1 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 38 pages, 9 figures

arXiv:2406.01291 [pdf, other]

WISDOM project XX -- Strong shear tearing molecular clouds apart in NGC 524

Authors: Anan Lu, Daryl Haggard, Martin Bureau, Jindra Gensior, Sarah Jeffreson, Carmelle Robert, Thomas G. Williams, Fu-Heng Liang, Woorak Choi, Timothy A. Davis, Sara Babic, Hope Boyce, Benjamin Cheung, Laurent Drissen, Jacob S. Elford, Lijie Liu, Thomas Martin, Carter Rhea, Laurie Rousseau-Nepton, Ilaria Ruffa

Abstract: Early-type galaxies (ETGs) are known to harbour dense spheroids of stars but scarce star formation (SF). Approximately a quarter of these galaxies have rich molecular gas reservoirs yet do not form stars efficiently. We study here the ETG NGC~524, with strong shear suspected to result in a smooth molecular gas disc and low star-formation efficiency (SFE). We present new spatially-resolved observat… ▽ More Early-type galaxies (ETGs) are known to harbour dense spheroids of stars but scarce star formation (SF). Approximately a quarter of these galaxies have rich molecular gas reservoirs yet do not form stars efficiently. We study here the ETG NGC~524, with strong shear suspected to result in a smooth molecular gas disc and low star-formation efficiency (SFE). We present new spatially-resolved observations of the \textsuperscript{12}CO(2-1)-emitting cold molecular gas from the Atacama Large Millimeter/sub-millimeter Array (ALMA) and of the warm ionised-gas emission lines from SITELLE at the Canada-France-Hawaii Telescope. Although constrained by the resolution of the ALMA observations ($\approx37$~pc), we identify only $52$ GMCs with radii ranging from $30$ to $140$~pc, a low mean molecular gas mass surface density $\langleΣ_{\rm gas}\rangle\approx125$~M$_\odot$~pc$^{-2}$ and a high mean virial parameter $\langleα_{\rm obs,vir}\rangle\approx5.3$. We measure spatially-resolved molecular gas depletion times ($τ_{\rm dep}\equiv1/{\rm SFE}$) with a spatial resolution of $\approx100$~pc within a galactocentric distance of $1.5$~kpc. The global depletion time is $\approx2.0$~Gyr but $τ_{\rm dep}$ increases toward the galaxy centre, with a maximum $τ_{\rm dep,max}\approx5.2$~Gyr. However, no pure \ion{H}{II} region is identified in NGC~524 using ionised-gas emission-line ratio diagnostics, so the $τ_{\rm dep}$ inferred are in fact lower limits. Measuring the GMC properties and dynamical states, we conclude that shear is the dominant mechanism shaping the molecular gas properties and regulating SF in NGC~524. This is supported by analogous analyses of the GMCs in a simulated ETG similar to NGC~524. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 17 pages, 10 figures. To be published in MNRAS, accepted on May 27

arXiv:2405.19709 [pdf, other]

WISDOM Project -- XXI. Giant molecular clouds in the central region of the barred spiral galaxy NGC 613: a steep size -- linewidth relation

Authors: Woorak Choi, Martin Bureau, Lijie Liu, Michele Cappellari, Timothy A. Davis, Jindra Gensior, Fu-Heng Liang, Anan Lu, Sanghyuk Moon, Ilaria Ruffa, Thomas G. Williams, Aeree Chung

Abstract: NGC~613 is a nearby barred spiral galaxy with a nuclear ring. Exploiting high spatial resolution ($\approx20$ pc) Atacama Large Millimeter/sub-millimeter Array $^{12}$CO(1-0) observations, we study the giant molecular clouds (GMCs) in the nuclear ring and its vicinity, identifying $158$ spatially- and spectrally-resolved GMCs. The GMC sizes ($R_{\mathrm{c}}$) are comparable to those of the clouds… ▽ More NGC~613 is a nearby barred spiral galaxy with a nuclear ring. Exploiting high spatial resolution ($\approx20$ pc) Atacama Large Millimeter/sub-millimeter Array $^{12}$CO(1-0) observations, we study the giant molecular clouds (GMCs) in the nuclear ring and its vicinity, identifying $158$ spatially- and spectrally-resolved GMCs. The GMC sizes ($R_{\mathrm{c}}$) are comparable to those of the clouds in the Milky Way (MW) disc, but their gas masses, observed linewidths ($σ_{\mathrm{obs,los}}$) and gas mass surface densities are larger. The GMC size -- linewidth relation ($σ_{\mathrm{obs,los}}\propto R_{\mathrm{c}}^{0.77}$) is steeper than that of the clouds of the MW disc and centre, and the GMCs are on average only marginally gravitationally bound (with a mean virial parameter $\langleα_{\mathrm{obs,vir}}\rangle\approx1.7$). We discuss the possible origins of the steep size -- linewidth relation and enhanced observed linewidths of the clouds and suggest that a combination of mechanisms such as stellar feedback, gas accretion and cloud-cloud collisions, as well as the gas inflows driven by the large-scale bar, may play a role. △ Less

Submitted 30 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: 15 pages, 8 figures, accepted for publication in MNRAS. arXiv admin note: text overlap with arXiv:2304.10471

arXiv:2405.05428 [pdf, other]

Adversary-Guided Motion Retargeting for Skeleton Anonymization

Authors: Thomas Carr, Depeng Xu, Aidong Lu

Abstract: Skeleton-based motion visualization is a rising field in computer vision, especially in the case of virtual reality (VR). With further advancements in human-pose estimation and skeleton extracting sensors, more and more applications that utilize skeleton data have come about. These skeletons may appear to be anonymous but they contain embedded personally identifiable information (PII). In this pap… ▽ More Skeleton-based motion visualization is a rising field in computer vision, especially in the case of virtual reality (VR). With further advancements in human-pose estimation and skeleton extracting sensors, more and more applications that utilize skeleton data have come about. These skeletons may appear to be anonymous but they contain embedded personally identifiable information (PII). In this paper we present a new anonymization technique that is based on motion retargeting, utilizing adversary classifiers to further remove PII embedded in the skeleton. Motion retargeting is effective in anonymization as it transfers the movement of the user onto the a dummy skeleton. In doing so, any PII linked to the skeleton will be based on the dummy skeleton instead of the user we are protecting. We propose a Privacy-centric Deep Motion Retargeting model (PMR) which aims to further clear the retargeted skeleton of PII through adversarial learning. In our experiments, PMR achieves motion retargeting utility performance on par with state of the art models while also reducing the performance of privacy attacks. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.02717 [pdf, other]

AFter: Attention-based Fusion Router for RGBT Tracking

Authors: Andong Lu, Wanyu Wang, Chenglong Li, Jin Tang, Bin Luo

Abstract: Multi-modal feature fusion as a core investigative component of RGBT tracking emerges numerous fusion studies in recent years. However, existing RGBT tracking methods widely adopt fixed fusion structures to integrate multi-modal feature, which are hard to handle various challenges in dynamic scenarios. To address this problem, this work presents a novel \emph{A}ttention-based \emph{F}usion rou\emp… ▽ More Multi-modal feature fusion as a core investigative component of RGBT tracking emerges numerous fusion studies in recent years. However, existing RGBT tracking methods widely adopt fixed fusion structures to integrate multi-modal feature, which are hard to handle various challenges in dynamic scenarios. To address this problem, this work presents a novel \emph{A}ttention-based \emph{F}usion rou\emph{ter} called AFter, which optimizes the fusion structure to adapt to the dynamic challenging scenarios, for robust RGBT tracking. In particular, we design a fusion structure space based on the hierarchical attention network, each attention-based fusion unit corresponding to a fusion operation and a combination of these attention units corresponding to a fusion structure. Through optimizing the combination of attention-based fusion units, we can dynamically select the fusion structure to adapt to various challenging scenarios. Unlike complex search of different structures in neural architecture search algorithms, we develop a dynamic routing algorithm, which equips each attention-based fusion unit with a router, to predict the combination weights for efficient optimization of the fusion structure. Extensive experiments on five mainstream RGBT tracking datasets demonstrate the superior performance of the proposed AFter against state-of-the-art RGBT trackers. We release the code in https://github.com/Alexadlu/AFter. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: Peer review

arXiv:2404.14829 [pdf, other]

Revisiting Neural Networks for Continual Learning: An Architectural Perspective

Authors: Aojun Lu, Tao Feng, Hangjie Yuan, Xiaotian Song, Yanan Sun

Abstract: Efforts to overcome catastrophic forgetting have primarily centered around developing more effective Continual Learning (CL) methods. In contrast, less attention was devoted to analyzing the role of network architecture design (e.g., network depth, width, and components) in contributing to CL. This paper seeks to bridge this gap between network architecture design and CL, and to present a holistic… ▽ More Efforts to overcome catastrophic forgetting have primarily centered around developing more effective Continual Learning (CL) methods. In contrast, less attention was devoted to analyzing the role of network architecture design (e.g., network depth, width, and components) in contributing to CL. This paper seeks to bridge this gap between network architecture design and CL, and to present a holistic study on the impact of network architectures on CL. This work considers architecture design at the network scaling level, i.e., width and depth, and also at the network components, i.e., skip connections, global pooling layers, and down-sampling. In both cases, we first derive insights through systematically exploring how architectural designs affect CL. Then, grounded in these insights, we craft a specialized search space for CL and further propose a simple yet effective ArchCraft method to steer a CL-friendly architecture, namely, this method recrafts AlexNet/ResNet into AlexAC/ResAC. Experimental validation across various CL settings and scenarios demonstrates that improved architectures are parameter-efficient, achieving state-of-the-art performance of CL while being 86%, 61%, and 97% more compact in terms of parameters than the naive CL architecture in Task IL and Class IL. Code is available at https://github.com/byyx666/ArchCraft. △ Less

Submitted 28 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.07425 [pdf, ps, other]

Precoder Design for User-Centric Network Massive MIMO with Matrix Manifold Optimization

Authors: Rui Sun, Li You, An-An Lu, Chen Sun, Xiqi Gao, Xiang-Gen Xia

Abstract: In this paper, we investigate the precoder design for user-centric network (UCN) massive multiple-input multiple-output (mMIMO) downlink with matrix manifold optimization. In UCN mMIMO systems, each user terminal (UT) is served by a subset of base stations (BSs) instead of all the BSs, facilitating the implementation of the system and lowering the dimension of the precoders to be designed. By prov… ▽ More In this paper, we investigate the precoder design for user-centric network (UCN) massive multiple-input multiple-output (mMIMO) downlink with matrix manifold optimization. In UCN mMIMO systems, each user terminal (UT) is served by a subset of base stations (BSs) instead of all the BSs, facilitating the implementation of the system and lowering the dimension of the precoders to be designed. By proving that the precoder set satisfying the per-BS power constraints forms a Riemannian submanifold of a linear product manifold, we transform the constrained precoder design problem in Euclidean space to an unconstrained one on the Riemannian submanifold. Riemannian ingredients, including orthogonal projection, Riemannian gradient, retraction and vector transport, of the problem on the Riemannian submanifold are further derived, with which the Riemannian conjugate gradient (RCG) design method is proposed for solving the unconstrained problem. The proposed method avoids the inverses of large dimensional matrices, which is beneficial in practice. The complexity analyses show the high computational efficiency of RCG precoder design. Simulation results demonstrate the numerical superiority of the proposed precoder design and the high efficiency of the UCN mMIMO system. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 13 pages, 9 figures, journal

arXiv:2404.00986 [pdf, other]

Make Continual Learning Stronger via C-Flat

Authors: Ang Bian, Wei Li, Hangjie Yuan, Chengrong Yu, Zixiang Zhao, Mang Wang, Aojun Lu, Tao Feng

Abstract: Model generalization ability upon incrementally acquiring dynamically updating knowledge from sequentially arriving tasks is crucial to tackle the sensitivity-stability dilemma in Continual Learning (CL). Weight loss landscape sharpness minimization seeking for flat minima lying in neighborhoods with uniform low loss or smooth gradient is proven to be a strong training regime improving model gener… ▽ More Model generalization ability upon incrementally acquiring dynamically updating knowledge from sequentially arriving tasks is crucial to tackle the sensitivity-stability dilemma in Continual Learning (CL). Weight loss landscape sharpness minimization seeking for flat minima lying in neighborhoods with uniform low loss or smooth gradient is proven to be a strong training regime improving model generalization compared with loss minimization based optimizer like SGD. Yet only a few works have discussed this training regime for CL, proving that dedicated designed zeroth-order sharpness optimizer can improve CL performance. In this work, we propose a Continual Flatness (C-Flat) method featuring a flatter loss landscape tailored for CL. C-Flat could be easily called with only one line of code and is plug-and-play to any CL methods. A general framework of C-Flat applied to all CL categories and a thorough comparison with loss minima optimizer and flat minima based CL approaches is presented in this paper, showing that our method can boost CL performance in almost all cases. Code will be publicly available upon publication. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.16151 [pdf, other]

Ultra Low-Cost Two-Stage Multimodal System for Non-Normative Behavior Detection

Authors: Albert Lu, Stephen Cranefield

Abstract: The online community has increasingly been inundated by a toxic wave of harmful comments. In response to this growing challenge, we introduce a two-stage ultra-low-cost multimodal harmful behavior detection method designed to identify harmful comments and images with high precision and recall rates. We first utilize the CLIP-ViT model to transform tweets and images into embeddings, effectively cap… ▽ More The online community has increasingly been inundated by a toxic wave of harmful comments. In response to this growing challenge, we introduce a two-stage ultra-low-cost multimodal harmful behavior detection method designed to identify harmful comments and images with high precision and recall rates. We first utilize the CLIP-ViT model to transform tweets and images into embeddings, effectively capturing the intricate interplay of semantic meaning and subtle contextual clues within texts and images. Then in the second stage, the system feeds these embeddings into a conventional machine learning classifier like SVM or logistic regression, enabling the system to be trained rapidly and to perform inference at an ultra-low cost. By converting tweets into rich multimodal embeddings through the CLIP-ViT model and utilizing them to train conventional machine learning classifiers, our system is not only capable of detecting harmful textual information with near-perfect performance, achieving precision and recall rates above 99\% but also demonstrates the ability to zero-shot harmful images without additional training, thanks to its multimodal embedding input. This capability empowers our system to identify unseen harmful images without requiring extensive and costly image datasets. Additionally, our system quickly adapts to new harmful content; if a new harmful content pattern is identified, we can fine-tune the classifier with the corresponding tweets' embeddings to promptly update the system. This makes it well suited to addressing the ever-evolving nature of online harmfulness, providing online communities with a robust, generalizable, and cost-effective tool to safeguard their communities. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: to be appear in International Workshop on Coordination, Organizations, Institutions, Norms and Ethics for Governance of Multi-Agent Systems

arXiv:2403.13588 [pdf, other]

Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models

Authors: Chengzhe Feng, Yanan Sun, Ke Li, Pan Zhou, Jiancheng Lv, Aojun Lu

Abstract: As Pre-trained Language Models (PLMs), a popular approach for code intelligence, continue to grow in size, the computational cost of their usage has become prohibitively expensive. Prompt learning, a recent development in the field of natural language processing, emerges as a potential solution to address this challenge. In this paper, we investigate the effectiveness of prompt learning in code in… ▽ More As Pre-trained Language Models (PLMs), a popular approach for code intelligence, continue to grow in size, the computational cost of their usage has become prohibitively expensive. Prompt learning, a recent development in the field of natural language processing, emerges as a potential solution to address this challenge. In this paper, we investigate the effectiveness of prompt learning in code intelligence tasks. We unveil its reliance on manually designed prompts, which often require significant human effort and expertise. Moreover, we discover existing automatic prompt design methods are very limited to code intelligence tasks due to factors including gradient dependence, high computational demands, and limited applicability. To effectively address both issues, we propose Genetic Auto Prompt (GenAP), which utilizes an elaborate genetic algorithm to automatically design prompts. With GenAP, non-experts can effortlessly generate superior prompts compared to meticulously manual-designed ones. GenAP operates without the need for gradients or additional computational costs, rendering it gradient-free and cost-effective. Moreover, GenAP supports both understanding and generation types of code intelligence tasks, exhibiting great applicability. We conduct GenAP on three popular code intelligence PLMs with three canonical code intelligence tasks including defect prediction, code summarization, and code translation. The results suggest that GenAP can effectively automate the process of designing prompts. Specifically, GenAP outperforms all other methods across all three tasks (e.g., improving accuracy by an average of 2.13% for defect prediction). To the best of our knowledge, GenAP is the first work to automatically design prompts for code intelligence PLMs. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.12487 [pdf]

Unveiling Four Key Factors for Tire Force Control Allocation in 4WID-4WIS Electric Vehicles at Handling Limits

Authors: Ao Lu, Runfeng Li, Yunchang Yu, Ziwang Lu, Guangyu Tian

Abstract: The four-wheel independent drive and four-wheel independent steering (4WID-4WIS) configurations enhance control flexibility and dynamic performance potential for more integrated electric vehicles. This paper comprehensively analyzes the impacts of four key factors on tire force control allocation: vertical load estimation, actuator dynamic characteristics, tire force constraints, and wheel steerin… ▽ More The four-wheel independent drive and four-wheel independent steering (4WID-4WIS) configurations enhance control flexibility and dynamic performance potential for more integrated electric vehicles. This paper comprehensively analyzes the impacts of four key factors on tire force control allocation: vertical load estimation, actuator dynamic characteristics, tire force constraints, and wheel steering precision at handling limits. The study demonstrates that precise vertical load estimation enhances lateral force allocation accuracy. Additionally, the self-compensating effect of lateral tire forces minimizes the impact of small deviations in vertical load estimation on tire force control allocation. A novel control allocation method considering actuator dynamics is introduced, effectively improving yaw rate response and reducing tracking errors. Considering tire-road adhesion and actuator rate constraints, an innovative method to calculate the real-time attainable tire force volume is proposed based on the tire slip ratio and slip angle. Feedforward control with bump steer compensation is implemented to improve wheel steering precision and lateral tire force control accuracy. Matlab/Simulink and Carsim co-simulation results emphasize the importance of these key factors' individual impacts and combined effects. This analysis offers valuable insights for developing advanced tire force control allocation strategies in 4WID-4WIS electric vehicles. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.02563 [pdf, ps, other]

Systemic Biases in Sign Language AI Research: A Deaf-Led Call to Reevaluate Research Agendas

Authors: Aashaka Desai, Maartje De Meulder, Julie A. Hochgesang, Annemarie Kocab, Alex X. Lu

Abstract: Growing research in sign language recognition, generation, and translation AI has been accompanied by calls for ethical development of such technologies. While these works are crucial to helping individual researchers do better, there is a notable lack of discussion of systemic biases or analysis of rhetoric that shape the research questions and methods in the field, especially as it remains domin… ▽ More Growing research in sign language recognition, generation, and translation AI has been accompanied by calls for ethical development of such technologies. While these works are crucial to helping individual researchers do better, there is a notable lack of discussion of systemic biases or analysis of rhetoric that shape the research questions and methods in the field, especially as it remains dominated by hearing non-signing researchers. Therefore, we conduct a systematic review of 101 recent papers in sign language AI. Our analysis identifies significant biases in the current state of sign language AI research, including an overfocus on addressing perceived communication barriers, a lack of use of representative datasets, use of annotations lacking linguistic foundations, and development of methods that build on flawed models. We take the position that the field lacks meaningful input from Deaf stakeholders, and is instead driven by what decisions are the most convenient or perceived as important to hearing researchers. We end with a call to action: the field must make space for Deaf researchers to lead the conversation in sign language AI. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.00033 [pdf, other]

LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition

Authors: Youbing Hu, Yun Cheng, Anqi Lu, Zhiqiang Cao, Dawei Wei, Jie Liu, Zhijun Li

Abstract: The Vision Transformer (ViT) excels in accuracy when handling high-resolution images, yet it confronts the challenge of significant spatial redundancy, leading to increased computational and memory requirements. To address this, we present the Localization and Focus Vision Transformer (LF-ViT). This model operates by strategically curtailing computational demands without impinging on performance.… ▽ More The Vision Transformer (ViT) excels in accuracy when handling high-resolution images, yet it confronts the challenge of significant spatial redundancy, leading to increased computational and memory requirements. To address this, we present the Localization and Focus Vision Transformer (LF-ViT). This model operates by strategically curtailing computational demands without impinging on performance. In the Localization phase, a reduced-resolution image is processed; if a definitive prediction remains elusive, our pioneering Neighborhood Global Class Attention (NGCA) mechanism is triggered, effectively identifying and spotlighting class-discriminative regions based on initial findings. Subsequently, in the Focus phase, this designated region is used from the original image to enhance recognition. Uniquely, LF-ViT employs consistent parameters across both phases, ensuring seamless end-to-end optimization. Our empirical tests affirm LF-ViT's prowess: it remarkably decreases Deit-S's FLOPs by 63\% and concurrently amplifies throughput twofold. Code of this project is at https://github.com/edgeai1/LF-ViT.git. △ Less

Submitted 7 January, 2024; originally announced February 2024.

arXiv:2401.17697 [pdf, ps, other]

Suppression of Blowup by Slightly Superlinear Degradation in a Parabolic-Elliptic Keller--Segel System with Signal-dependent Motility

Authors: Aijing Lu, Jie Jiang

Abstract: In this paper, we consider an initial-Neumann boundary value problem for a parabolic-elliptic Keller-Segel system with signal-dependent motility and a source term. Previous research has rigorously shown that the source-free version of this system exhibits an infinite-time blowup phenomenon when dimension $N \geq 2$. In the current work, when $N \leq 3$, we establish uniform boundedness of global c… ▽ More In this paper, we consider an initial-Neumann boundary value problem for a parabolic-elliptic Keller-Segel system with signal-dependent motility and a source term. Previous research has rigorously shown that the source-free version of this system exhibits an infinite-time blowup phenomenon when dimension $N \geq 2$. In the current work, when $N \leq 3$, we establish uniform boundedness of global classical solutions with an additional source term that involves slightly super-linear degradation effect on the density, of a maximum growth order $s\log s$, unveiling a sufficient blowup suppression mechanism. The motility function considered in our work takes a rather general form compared with recent works \cite{FuJi2020, LyWa2023} which were restricted to the monotone non-increasing case. The cornerstone of our proof lies in deriving an upper bound for the second component of the system and an entropy-like estimate, which are achieved through tricky comparison skills and energy methods, respectively. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.02035 [pdf, ps, other]

Efficient Information Geometry Approach for Massive MIMO-OFDM Channel Estimation

Authors: Jiyuan Yang, Yan Chen, Mingrui Fan, An-An Lu, Wen Zhong, Xiqi Gao, Xiaohu You, Xiang-Gen Xia, Dirk Slock

Abstract: We investigate the channel estimation for massive multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We revisit the information geometry approach (IGA) for massive MIMO-OFDM channel estimation. By using the constant magnitude property of the entries of the measurement matrix, we find that the second-order natural parameters of the distributions on all th… ▽ More We investigate the channel estimation for massive multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We revisit the information geometry approach (IGA) for massive MIMO-OFDM channel estimation. By using the constant magnitude property of the entries of the measurement matrix, we find that the second-order natural parameters of the distributions on all the auxiliary manifolds are equivalent to each other, and the first-order natural parameters are asymptotically equivalent to each other at the fixed point. Motivated by these results, we simplify the process of IGA and propose an efficient IGA (EIGA) for massive MIMO-OFDM channel estimation, which allows efficient implementation with fast Fourier transformation (FFT). We then establish a sufficient condition of its convergence and accordingly find a range of the damping factor for the convergence. We show that this range of damping factor is sufficiently wide by using the specific properties of the measurement matrices. Further, we prove that at the fixed point, the a posteriori mean obtained by EIGA is asymptotically optimal. Simulations confirm that EIGA can achieve the optimal performance with low complexity in a limited number of iterations. △ Less

Submitted 3 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.01674 [pdf, other]

Transformer RGBT Tracking with Spatio-Temporal Multimodal Tokens

Authors: Dengdi Sun, Yajie Pan, Andong Lu, Chenglong Li, Bin Luo

Abstract: Many RGBT tracking researches primarily focus on modal fusion design, while overlooking the effective handling of target appearance changes. While some approaches have introduced historical frames or fuse and replace initial templates to incorporate temporal information, they have the risk of disrupting the original target appearance and accumulating errors over time. To alleviate these limitation… ▽ More Many RGBT tracking researches primarily focus on modal fusion design, while overlooking the effective handling of target appearance changes. While some approaches have introduced historical frames or fuse and replace initial templates to incorporate temporal information, they have the risk of disrupting the original target appearance and accumulating errors over time. To alleviate these limitations, we propose a novel Transformer RGBT tracking approach, which mixes spatio-temporal multimodal tokens from the static multimodal templates and multimodal search regions in Transformer to handle target appearance changes, for robust RGBT tracking. We introduce independent dynamic template tokens to interact with the search region, embedding temporal information to address appearance changes, while also retaining the involvement of the initial static template tokens in the joint feature extraction process to ensure the preservation of the original reliable target appearance information that prevent deviations from the target appearance caused by traditional temporal updates. We also use attention mechanisms to enhance the target features of multimodal template tokens by incorporating supplementary modal cues, and make the multimodal search region tokens interact with multimodal dynamic template tokens via attention mechanisms, which facilitates the conveyance of multimodal-enhanced target change information. Our module is inserted into the transformer backbone network and inherits joint feature extraction, search-template matching, and cross-modal interaction. Extensive experiments on three RGBT benchmark datasets show that the proposed approach maintains competitive performance compared to other state-of-the-art tracking algorithms while running at 39.1 FPS. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.16246 [pdf, other]

Nighttime Person Re-Identification via Collaborative Enhancement Network with Multi-domain Learning

Authors: Andong Lu, Tianrui Zha, Chenglong Li, Jin Tang, Xiaofeng Wang, Bin Luo

Abstract: Prevalent nighttime ReID methods typically combine relighting networks and ReID networks in a sequential manner, which not only restricts the ReID performance by the quality of relighting images, but also neglects the effective collaborative modeling between image relighting and person ReID tasks. To handle these problems, we propose a novel Collaborative Enhancement Network called CENet, which pe… ▽ More Prevalent nighttime ReID methods typically combine relighting networks and ReID networks in a sequential manner, which not only restricts the ReID performance by the quality of relighting images, but also neglects the effective collaborative modeling between image relighting and person ReID tasks. To handle these problems, we propose a novel Collaborative Enhancement Network called CENet, which performs the multilevel feature interactions in a parallel framework, for nighttime person ReID. In particular, CENet is a parallel Transformer network, in which the designed parallel structure can avoid the impact of the quality of relighting images on ReID performance. To perform effective collaborative modeling between image relighting and person ReID tasks, we integrate the multilevel feature interactions in CENet. Specifically, we share the Transformer encoder to build the low-level feature interaction, and then perform the feature distillation to transfer the high-level features from image relighting to ReID. In addition, the sizes of existing real-world nighttime person ReID datasets are small, and large-scale synthetic ones exhibit substantial domain gaps with real-world data. To leverage both small-scale real-world and large-scale synthetic training data, we develop a multi-domain learning algorithm, which alternately utilizes both kinds of data to reduce the inter-domain difference in the training of CENet. Extensive experiments on two real nighttime datasets, \textit{Night600} and \textit{RGBNT201$_{rgb}$}, and a synthetic nighttime ReID dataset are conducted to validate the effectiveness of CENet. We will release the code and synthetic dataset. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.16244 [pdf, other]

Modality-missing RGBT Tracking: Invertible Prompt Learning and High-quality Benchmarks

Authors: Andong Lu, Jiacong Zhao, Chenglong Li, Jin Tang, Bin Luo

Abstract: Current RGBT tracking research relies on the complete multi-modal input, but modal information might miss due to some factors such as thermal sensor self-calibration and data transmission error, called modality-missing challenge in this work. To address this challenge, we propose a novel invertible prompt learning approach, which integrates the content-preserving prompts into a well-trained tracki… ▽ More Current RGBT tracking research relies on the complete multi-modal input, but modal information might miss due to some factors such as thermal sensor self-calibration and data transmission error, called modality-missing challenge in this work. To address this challenge, we propose a novel invertible prompt learning approach, which integrates the content-preserving prompts into a well-trained tracking model to adapt to various modality-missing scenarios, for robust RGBT tracking. Given one modality-missing scenario, we propose to utilize the available modality to generate the prompt of the missing modality to adapt to RGBT tracking model. However, the cross-modality gap between available and missing modalities usually causes semantic distortion and information loss in prompt generation. To handle this issue, we design the invertible prompter by incorporating the full reconstruction of the input available modality from the generated prompt. To provide a comprehensive evaluation platform, we construct several high-quality benchmark datasets, in which various modality-missing scenarios are considered to simulate real-world challenges. Extensive experiments on three modality-missing benchmark datasets show that our method achieves significant performance improvements compared with state-of-the-art methods. We have released the code and simulation datasets at: \href{https://github.com/Alexadlu/Modality-missing-RGBT-Tracking.git}{https://github.com/Alexadlu/Modality-missing-RGBT-Tracking.git}. △ Less

Submitted 20 March, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

arXiv:2311.17848 [pdf, other]

WISDOM Project -- XVI. The link between circumnuclear molecular gas reservoirs and active galactic nucleus fuelling

Authors: Jacob S. Elford, Timothy A. Davis, Ilaria Ruffa, Martin Bureau, Michele Cappellari, Jindra Gensior, Satoru Iguchi, Fu-Heng Liang, Lijie Liu, Anan Lu, Thomas G. Williams

Abstract: We use high-resolution data from the millimetre-Wave Interferometric Survey of Dark Object Masses (WISDOM) project to investigate the connection between circumnuclear gas reservoirs and nuclear activity in a sample of nearby galaxies. Our sample spans a wide range of nuclear activity types including radio galaxies, Seyfert galaxies, low-luminosity active galactic nuclei (AGN) and inactive galaxies… ▽ More We use high-resolution data from the millimetre-Wave Interferometric Survey of Dark Object Masses (WISDOM) project to investigate the connection between circumnuclear gas reservoirs and nuclear activity in a sample of nearby galaxies. Our sample spans a wide range of nuclear activity types including radio galaxies, Seyfert galaxies, low-luminosity active galactic nuclei (AGN) and inactive galaxies. We use measurements of nuclear millimetre continuum emission along with other archival tracers of AGN accretion/activity to investigate previous claims that at, circumnuclear scales (<100 pc), these should correlate with the mass of the cold molecular gas. We find that the molecular gas mass does not correlate with any tracer of nuclear activity. This suggests the level of nuclear activity cannot solely be regulated by the amount of cold gas around the supermassive black hole (SMBH). This indicates that AGN fuelling, that drives gas from the large scale galaxy to the nuclear regions, is not a ubiquitous process and may vary between AGN type, with timescale variations likely to be very important. By studying the structure of the central molecular gas reservoirs, we find our galaxies have a range of nuclear molecular gas concentrations. This could indicate that some of our galaxies may have had their circumnuclear regions impacted by AGN feedback, even though they currently have low nuclear activity. On the other hand, the nuclear molecular gas concentrations in our galaxies could instead be set by secular processes. △ Less

Submitted 24 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: 15 pages plus 3 in the appendix, 8 figures plus 1 in the appendix, 3 tables plus 4 in the appendix

arXiv:2311.15447 [pdf, other]

WISDOM project -- XVIII. Molecular gas distributions and kinematics of three megamaser galaxies

Authors: Fu-Heng Liang, Mark D. Smith, Martin Bureau, Feng Gao, Timothy A. Davis, Michele Cappellari, Jacob S. Elford, Jenny E. Greene, Satoru Iguchi, Federico Lelli, Anan Lu, Ilaria Ruffa, Thomas G. Williams, Hengyue Zhang

Abstract: The co-evolution of galaxies and supermassive black holes (SMBHs) underpins our understanding of galaxy evolution, but different methods to measure SMBH masses have only infrequently been cross-checked. We attempt to identify targets to cross-check two of the most accurate methods, megamaser and cold molecular gas dynamics. Three promising galaxies are selected from all those with existing megamas… ▽ More The co-evolution of galaxies and supermassive black holes (SMBHs) underpins our understanding of galaxy evolution, but different methods to measure SMBH masses have only infrequently been cross-checked. We attempt to identify targets to cross-check two of the most accurate methods, megamaser and cold molecular gas dynamics. Three promising galaxies are selected from all those with existing megamaser SMBH mass measurements. We present Atacama Large Millimeter/sub-millimeter Array (ALMA) 12CO(2-1) and 230-GHz continuum observations with angular resolutions of about 0.5". Every galaxy has an extended rotating molecular gas disc and 230-GHz continuum source(s), but all also have irregularities and/or non-axisymmetric features: NGC1194 is highly inclined and has disturbed and lopsided central 12CO(2-1) emission; NGC3393 has a nuclear disc with fairly regular but patchy 12CO(2-1) emission with little gas near the kinematic major axis, faint emission in the very centre and two brighter structures reminiscent of a nuclear ring and/or spiral; NGC5765B has a strong bar and very bright 12CO(2-1) emission concentrated along two bisymmetric offset dust lanes and two bisymmetric nuclear spiral arms. 12CO(2-1) and 12CO(3-2) observations with the James Clerk Maxwell Telescope are compared with the ALMA observations. Because of the disturbed gas kinematics and the impractically long integration times required for higher angular resolution observations, none of the three galaxies is suitable for a future SMBH mass measurement. Nonetheless, increasing the number of molecular gas observations of megamaser galaxies is valuable, and the ubiquitous disturbances suggest a link between large-scale gas properties and the existence of megamasers. △ Less

Submitted 10 December, 2023; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: 17 pages, 5 figures, accepted by MNRAS

arXiv:2311.00690 [pdf, other]

What User Behaviors Make the Differences During the Process of Visual Analytics?

Authors: Zekun Wu, Shahin Doroudian, Aidong Lu

Abstract: The understanding of visual analytics process can benefit visualization researchers from multiple aspects, including improving visual designs and developing advanced interaction functions. However, the log files of user behaviors are still hard to analyze due to the complexity of sensemaking and our lack of knowledge on the related user behaviors. This work presents a study on a comprehensive data… ▽ More The understanding of visual analytics process can benefit visualization researchers from multiple aspects, including improving visual designs and developing advanced interaction functions. However, the log files of user behaviors are still hard to analyze due to the complexity of sensemaking and our lack of knowledge on the related user behaviors. This work presents a study on a comprehensive data collection of user behaviors, and our analysis approach with time-series classification methods. We have chosen a classical visualization application, Covid-19 data analysis, with common analysis tasks covering geo-spatial, time-series and multi-attributes. Our user study collects user behaviors on a diverse set of visualization tasks with two comparable systems, desktop and immersive visualizations. We summarize the classification results with three time-series machine learning algorithms at two scales, and explore the influences of behavior features. Our results reveal that user behaviors can be distinguished during the process of visual analytics and there is a potentially strong association between the physical behaviors of users and the visualization tasks they perform. We also demonstrate the usage of our models by interpreting open sessions of visual analytics, which provides an automatic way to study sensemaking without tedious manual annotations. △ Less

Submitted 3 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: This version corrects the issues of previous versions

arXiv:2310.12108 [pdf, other]

New Black Hole Spin Values for Sagittarius A* Obtained with the Outflow Method

Authors: Ruth A. Daly, Megan Donahue, Christopher P. O'Dea, Biny Sebastian, Daryl Haggard, Anan Lu

Abstract: Six archival Chandra observations are matched with eight sets of radio data and studied in the context of the outflow method to measure and study the spin properties of $\rm{Sgr ~A^*}$. Three radio and X-ray data sets obtained simultaneously, or partially simultaneously, are identified as preferred for the purpose of measuring the spin properties of $\rm{Sgr ~A^*}$. Similar results are obtained wi… ▽ More Six archival Chandra observations are matched with eight sets of radio data and studied in the context of the outflow method to measure and study the spin properties of $\rm{Sgr ~A^*}$. Three radio and X-ray data sets obtained simultaneously, or partially simultaneously, are identified as preferred for the purpose of measuring the spin properties of $\rm{Sgr ~A^*}$. Similar results are obtained with other data sets. Results obtained with the preferred data sets are combined and indicate a weighted mean value of the spin function of $\rm{F} = 0.62 \pm 0.10$ and dimensionless spin angular momentum of $\rm{a_*} = 0.90 \pm 0.06$. The spin function translates into measurements of the black hole rotational mass, $\rm{M_{rot}}$, irreducible mass, $\rm{M_{irr}}$, and spin mass-energy available for extraction, $\rm{M_{spin}}$, relative to the total black hole dynamical mass, $\rm{M_{dyn}}$. Weighted mean values of $\rm{(M_{rot}/M_{dyn}) = (0.53 \pm 0.06)}$, $\rm{({M_{irr}/M_{dyn})} = (0.85 \pm 0.04)}$, $\rm{({M_{spin}/M_{dyn})} = (0.15 \pm 0.04)}$, $\rm{M_{rot} = (2.2 \pm 0.3) \times 10^6 ~M_{\odot}}$, $\rm{M_{irr} = (3.5 \pm 0.2) \times 10^6 ~M_{\odot}}$, and $\rm{M_{spin} = (6.2 \pm 1.6) \times 10^5 ~M_{\odot}}$ are obtained; of course $\rm{{(M_{rot}/M_{irr})} = (0.62 \pm 0.10)}$ since $\rm{{(M_{rot}/M_{irr})} = F}$. Values obtained for $\rm{Sgr ~A^*}$ are compared with those obtained for M87 based on the published spin function which indicate that M87 carries substantially more rotational energy and spin mass-energy relative to the total (i.e., dynamical) black hole mass, the irreducible black hole mass, and in absolute terms. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: Accepted for publication in MNRAS on October 16, 2023

arXiv:2310.11912 [pdf, other]

The JWST Galactic Center Survey -- A White Paper

Authors: Rainer Schoedel, Steve Longmore, Jonny Henshaw, Adam Ginsburg, John Bally, Anja Feldmeier, Matt Hosek, Francisco Nogueras Lara, Anna Ciurlo, Mélanie Chevance, J. M. Diederik Kruijssen, Ralf Klessen, Gabriele Ponti, Pau Amaro-Seoane, Konstantina Anastasopoulou, Jay Anderson, Maria Arias, Ashley T. Barnes, Cara Battersby, Giuseppe Bono, Lucía Bravo Ferres, Aaron Bryant, Miguel Cano Gonzáalez, Santi Cassisi, Leonardo Chaves-Velasquez , et al. (85 additional authors not shown)

Abstract: The inner hundred parsecs of the Milky Way hosts the nearest supermassive black hole, largest reservoir of dense gas, greatest stellar density, hundreds of massive main and post main sequence stars, and the highest volume density of supernovae in the Galaxy. As the nearest environment in which it is possible to simultaneously observe many of the extreme processes shaping the Universe, it is one of… ▽ More The inner hundred parsecs of the Milky Way hosts the nearest supermassive black hole, largest reservoir of dense gas, greatest stellar density, hundreds of massive main and post main sequence stars, and the highest volume density of supernovae in the Galaxy. As the nearest environment in which it is possible to simultaneously observe many of the extreme processes shaping the Universe, it is one of the most well-studied regions in astrophysics. Due to its proximity, we can study the center of our Galaxy on scales down to a few hundred AU, a hundred times better than in similar Local Group galaxies and thousands of times better than in the nearest active galaxies. The Galactic Center (GC) is therefore of outstanding astrophysical interest. However, in spite of intense observational work over the past decades, there are still fundamental things unknown about the GC. JWST has the unique capability to provide us with the necessary, game-changing data. In this White Paper, we advocate for a JWST NIRCam survey that aims at solving central questions, that we have identified as a community: i) the 3D structure and kinematics of gas and stars; ii) ancient star formation and its relation with the overall history of the Milky Way, as well as recent star formation and its implications for the overall energetics of our galaxy's nucleus; and iii) the (non-)universality of star formation and the stellar initial mass function. We advocate for a large-area, multi-epoch, multi-wavelength NIRCam survey of the inner 100\,pc of the Galaxy in the form of a Treasury GO JWST Large Program that is open to the community. We describe how this survey will derive the physical and kinematic properties of ~10,000,000 stars, how this will solve the key unknowns and provide a valuable resource for the community with long-lasting legacy value. △ Less

Submitted 14 March, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: This White Paper will be updated when required (e.g. new authors joining, editing of content). Most recent update: 24 Oct 2023

arXiv:2310.07822 [pdf, other]

Body-mounted MR-conditional Robot for Minimally Invasive Liver Intervention

Authors: Zhefeng Huang, Anthony L. Gunderman, Samuel E. Wilcox, Saikat Sengupta, Jay Shah, Aiming Lu, David Woodrum, Yue Chen

Abstract: MR-guided microwave ablation (MWA) has proven effective in treating hepatocellular carcinoma (HCC) with small-sized tumors, but the state-of-the-art technique suffers from sub-optimal workflow due to speed and accuracy of needle placement. This paper presents a compact body-mounted MR-conditional robot that can operate in closed-bore MR scanners for accurate needle guidance. The robotic platform c… ▽ More MR-guided microwave ablation (MWA) has proven effective in treating hepatocellular carcinoma (HCC) with small-sized tumors, but the state-of-the-art technique suffers from sub-optimal workflow due to speed and accuracy of needle placement. This paper presents a compact body-mounted MR-conditional robot that can operate in closed-bore MR scanners for accurate needle guidance. The robotic platform consists of two stacked Cartesian XY stages, each with two degrees of freedom, that facilitate needle guidance. The robot is actuated using 3D-printed pneumatic turbines with MR-conditional bevel gear transmission systems. Pneumatic valves and control mechatronics are located inside the MRI control room and are connected to the robot with pneumatic transmission lines and optical fibers. Free space experiments indicated robot-assisted needle insertion error of 2.6$\pm$1.3 mm at an insertion depth of 80 mm. The MR-guided phantom studies were conducted to verify the MR-conditionality and targeting performance of the robot. Future work will focus on the system optimization and validations in animal trials. △ Less

Submitted 25 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 10 figures

arXiv:2310.00587 [pdf, other]

doi 10.1109/IWAENC53105.2022.9914771

Mechatronic Generation of Datasets for Acoustics Research

Authors: Austin Lu, Ethaniel Moore, Arya Nallanthighall, Kanad Sarkar, Manan Mittal, Ryan M. Corey, Paris Smaragdis, Andrew Singer

Abstract: We address the challenge of making spatial audio datasets by proposing a shared mechanized recording space that can run custom acoustic experiments: a Mechatronic Acoustic Research System (MARS). To accommodate a wide variety of experiments, we implement an extensible architecture for wireless multi-robot coordination which enables synchronized robot motion for dynamic scenes with moving speakers… ▽ More We address the challenge of making spatial audio datasets by proposing a shared mechanized recording space that can run custom acoustic experiments: a Mechatronic Acoustic Research System (MARS). To accommodate a wide variety of experiments, we implement an extensible architecture for wireless multi-robot coordination which enables synchronized robot motion for dynamic scenes with moving speakers and microphones. Using a virtual control interface, we can remotely design automated experiments to collect large-scale audio data. This data is shown to be similar across repeated runs, demonstrating the reliability of MARS. We discuss the potential for MARS to make audio data collection accessible for researchers without dedicated acoustic research spaces. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: 5 pages, 5 figures, IWAENC 2022

arXiv:2308.16486 [pdf, other]

Illumination Distillation Framework for Nighttime Person Re-Identification and A New Benchmark

Authors: Andong Lu, Zhang Zhang, Yan Huang, Yifan Zhang, Chenglong Li, Jin Tang, Liang Wang

Abstract: Nighttime person Re-ID (person re-identification in the nighttime) is a very important and challenging task for visual surveillance but it has not been thoroughly investigated. Under the low illumination condition, the performance of person Re-ID methods usually sharply deteriorates. To address the low illumination challenge in nighttime person Re-ID, this paper proposes an Illumination Distillati… ▽ More Nighttime person Re-ID (person re-identification in the nighttime) is a very important and challenging task for visual surveillance but it has not been thoroughly investigated. Under the low illumination condition, the performance of person Re-ID methods usually sharply deteriorates. To address the low illumination challenge in nighttime person Re-ID, this paper proposes an Illumination Distillation Framework (IDF), which utilizes illumination enhancement and illumination distillation schemes to promote the learning of Re-ID models. Specifically, IDF consists of a master branch, an illumination enhancement branch, and an illumination distillation module. The master branch is used to extract the features from a nighttime image. The illumination enhancement branch first estimates an enhanced image from the nighttime image using a nonlinear curve mapping method and then extracts the enhanced features. However, nighttime and enhanced features usually contain data noise due to unstable lighting conditions and enhancement failures. To fully exploit the complementary benefits of nighttime and enhanced features while suppressing data noise, we propose an illumination distillation module. In particular, the illumination distillation module fuses the features from two branches through a bottleneck fusion model and then uses the fused features to guide the learning of both branches in a distillation manner. In addition, we build a real-world nighttime person Re-ID dataset, named Night600, which contains 600 identities captured from different viewpoints and nighttime illumination conditions under complex outdoor environments. Experimental results demonstrate that our IDF can achieve state-of-the-art performance on two nighttime person Re-ID datasets (i.e., Night600 and Knight ). We will release our code and dataset at https://github.com/Alexadlu/IDF. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: Accepted by TMM

arXiv:2308.05146 [pdf, other]

WISDOM Project -- XVII. Beam-by-beam Properties of the Molecular Gas in Early-type Galaxies

Authors: Thomas G. Williams, Martin Bureau, Timothy A. Davis, Michele Cappellari, Woorak Choi, Jacob S. Elford, Satoru Iguchi, Jindra Gensior, Fu-Heng Liang, Anan Lu, Ilaria Ruffa, Hengyue Zhang

Abstract: We present a study of the molecular gas of seven early-type galaxies with high angular resolution data obtained as part of the mm-Wave Interferometric Survey of Dark Object Masses (WISDOM) project with the Atacama Large Millimeter/submillimeter Array. Using a fixed spatial scale approach, we study the mass surface density ($Σ$) and velocity dispersion ($σ$) of the molecular gas on spatial scales r… ▽ More We present a study of the molecular gas of seven early-type galaxies with high angular resolution data obtained as part of the mm-Wave Interferometric Survey of Dark Object Masses (WISDOM) project with the Atacama Large Millimeter/submillimeter Array. Using a fixed spatial scale approach, we study the mass surface density ($Σ$) and velocity dispersion ($σ$) of the molecular gas on spatial scales ranging from $60$ to $120$pc. Given the spatial resolution of our data ($20$ - $70$pc), we characterise these properties across many thousands of individual sight lines ($\approx50,000$ at our highest physical resolution). The molecular gas along these sight lines has a large range ($\approx2$dex) of mass surface densities and velocity dispersions $\approx40\%$ higher than those of star-forming spiral galaxies. It has virial parameters $α_\mathrm{vir}$ that depend weakly on the physical scale observed, likely due to beam smearing of the bulk galactic rotation, and is generally super-virial. Comparing the internal turbulent pressure ($P_\mathrm{turb}$) to the pressure required for dynamic equilibrium ($P_\mathrm{DE}$), the ratio $P_\mathrm{turb}$/$P_\mathrm{DE}$ is significantly less than unity in all galaxies, indicating that the gas is not in dynamic equilibrium and is strongly compressed, in apparent contradiction to the virial parameters. This may be due to our neglect of shear and tidal forces, and/or the combination of three-dimensional and vertical diagnostics. Both $α_\mathrm{vir}$ and $P_\mathrm{turb}$ anti-correlate with the global star-formation rate of our galaxies. We therefore conclude that the molecular gas in early-type galaxies is likely unbound, and that large-scale dynamics likely plays a critical role in its regulation. This contrasts to the giant molecular clouds in the discs of late-type galaxies, that are much closer to dynamical equilibrium. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 32 pages (16 of Appendices), 39 Figures (27 in Appendices). Accepted for publication in MNRAS

arXiv:2307.08848 [pdf]

Microbiome-derived bile acids contribute to elevated antigenic response and bone erosion in rheumatoid arthritis

Authors: Xiuli Su, Xiaona Li, Yanqin Bian, Qing Ren, Leiguang Li, Xiaohao Wu, Hemi Luan, Bing He, Xiaojuan He, Hui Feng, Xingye Cheng, Pan-Jun Kim, Leihan Tang, Aiping Lu, Lianbo Xiao, Liang Tian, Zhu Yang, Zongwei Cai

Abstract: Rheumatoid arthritis (RA) is a chronic, disabling and incurable autoimmune disease. It has been widely recognized that gut microbial dysbiosis is an important contributor to the pathogenesis of RA, although distinct alterations in microbiota have been associated with this disease. Yet, the metabolites that mediate the impacts of the gut microbiome on RA are less well understood. Here, with microbi… ▽ More Rheumatoid arthritis (RA) is a chronic, disabling and incurable autoimmune disease. It has been widely recognized that gut microbial dysbiosis is an important contributor to the pathogenesis of RA, although distinct alterations in microbiota have been associated with this disease. Yet, the metabolites that mediate the impacts of the gut microbiome on RA are less well understood. Here, with microbial profiling and non-targeted metabolomics, we revealed profound yet diverse perturbation of the gut microbiome and metabolome in RA patients in a discovery set. In the Bacteroides-dominated RA patients, differentiation of gut microbiome resulted in distinct bile acid profiles compared to healthy subjects. Predominated Bacteroides species expressing BSH and 7a-HSDH increased, leading to elevated secondary bile acid production in this subgroup of RA patients. Reduced serum fibroblast growth factor-19 and dysregulated bile acids were evidence of impaired farnesoid X receptor-mediated signaling in the patients. This gut microbiota-bile acid axis was correlated to ACPA. The patients from the validation sets demonstrated that ACPA-positive patients have more abundant bacteria expressing BSH and 7a-HSDH but less Clostridium scindens expressing 7a-dehydroxylation enzymes, together with dysregulated microbial bile acid metabolism and more severe bone erosion than ACPA-negative ones. Mediation analyses revealed putative causal relationships between the gut microbiome, bile acids, and ACPA-positive RA, supporting a potential causal effect of Bacteroides species in increasing levels of ACPA and bone erosion mediated via disturbing bile acid metabolism. These results provide insights into the role of gut dysbiosis in RA in a manifestation-specific manner, as well as the functions of bile acids in this gut-joint axis, which may be a potential intervention target for precisely controlling RA conditions. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: 38 pages, 6 figures

arXiv:2306.15123 [pdf, other]

Investigating Cross-Domain Behaviors of BERT in Review Understanding

Authors: Albert Lu, Meng Jiang

Abstract: Review score prediction requires review text understanding, a critical real-world application of natural language processing. Due to dissimilar text domains in product reviews, a common practice is fine-tuning BERT models upon reviews of differing domains. However, there has not yet been an empirical study of cross-domain behaviors of BERT models in the various tasks of product review understandin… ▽ More Review score prediction requires review text understanding, a critical real-world application of natural language processing. Due to dissimilar text domains in product reviews, a common practice is fine-tuning BERT models upon reviews of differing domains. However, there has not yet been an empirical study of cross-domain behaviors of BERT models in the various tasks of product review understanding. In this project, we investigate text classification BERT models fine-tuned on single-domain and multi-domain Amazon review data. In our findings, though single-domain models achieved marginally improved performance on their corresponding domain compared to multi-domain models, multi-domain models outperformed single-domain models when evaluated on multi-domain data, single-domain data the single-domain model was not fine-tuned on, and on average when considering all tests. Though slight increases in accuracy can be achieved through single-domain model fine-tuning, computational resources and costs can be reduced by utilizing multi-domain models that perform well across domains. △ Less

Submitted 27 June, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: 9 pages, 1 figure, 2 tables

arXiv:2306.08962 [pdf]

Enhanced ferromagnetism in artificially stretched lattice in quasi two-dimensional Cr2Ge2Te6

Authors: Hiroshi Idzuchi, Andres E Llacsahuanga Allcca, Anh Khoa Augustin Lu, Mitsuhiro Saito, Michel Houssa, Ruishen Meng, Kazutoshi Inoue, Xing-Chen Pan, Katsumi Tanigaki, Yuichi Ikuhara, Takeshi Nakanishi, Yong P Chen

Abstract: In the fundamental understanding of magnetic interactions between atoms in solids, the crystal lattice is one of the key parameters. As the effective tool for controlling the lattice using tensile stress is limited, there are only few demonstrations of the control in magnetic properties with expanding the lattice structure. Here, we observe that the Curie temperature (Tc) of quasi two-dimensional… ▽ More In the fundamental understanding of magnetic interactions between atoms in solids, the crystal lattice is one of the key parameters. As the effective tool for controlling the lattice using tensile stress is limited, there are only few demonstrations of the control in magnetic properties with expanding the lattice structure. Here, we observe that the Curie temperature (Tc) of quasi two-dimensional Cr2Ge2Te6 with NiO overlayer doubles from ~60 K to ~120 K, describe a clear correlation of magnetic properties with lattice expansion, which is characterized by several probes and computational approaches, and address on the mechanisms leading to the increase in Tc via the change in exchange interactions. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2305.00666 [pdf, other]

Part Aware Contrastive Learning for Self-Supervised Action Recognition

Authors: Yilei Hua, Wenhan Wu, Ce Zheng, Aidong Lu, Mengyuan Liu, Chen Chen, Shiqian Wu

Abstract: In recent years, remarkable results have been achieved in self-supervised action recognition using skeleton sequences with contrastive learning. It has been observed that the semantic distinction of human action features is often represented by local body parts, such as legs or hands, which are advantageous for skeleton-based action recognition. This paper proposes an attention-based contrastive l… ▽ More In recent years, remarkable results have been achieved in self-supervised action recognition using skeleton sequences with contrastive learning. It has been observed that the semantic distinction of human action features is often represented by local body parts, such as legs or hands, which are advantageous for skeleton-based action recognition. This paper proposes an attention-based contrastive learning framework for skeleton representation learning, called SkeAttnCLR, which integrates local similarity and global features for skeleton-based action representations. To achieve this, a multi-head attention mask module is employed to learn the soft attention mask features from the skeletons, suppressing non-salient local features while accentuating local salient features, thereby bringing similar local features closer in the feature space. Additionally, ample contrastive pairs are generated by expanding contrastive pairs based on salient and non-salient features with global features, which guide the network to learn the semantic representations of the entire skeleton. Therefore, with the attention mask mechanism, SkeAttnCLR learns local features under different data augmentation views. The experiment results demonstrate that the inclusion of local feature similarity significantly enhances skeleton-based action representation. Our proposed SkeAttnCLR outperforms state-of-the-art methods on NTURGB+D, NTU120-RGB+D, and PKU-MMD datasets. △ Less

Submitted 11 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: 7 pages, 4 figures, accepted by IJCAI 2023

arXiv:2304.10471 [pdf, other]

doi 10.1093/mnras/stad1211

WISDOM Project -- XV. Giant Molecular Clouds in the Central Region of the Barred Spiral Galaxy NGC 5806

Authors: Woorak Choi, Lijie Liu, Martin Bureau, Michele Cappellari, Timothy A. Davis, Jindra Gensior, Fu-Heng Liang, Anan Lu, Thomas G. Williams, Aeree Chung

Abstract: We present high spatial resolution ($\approx24$ pc) Atacama Large Millimeter/sub-millimeter Array $^{12}$CO(2-1) observations of the central region of the nearby barred spiral galaxy NGC 5806. NGC 5806 has a highly structured molecular gas distribution with a clear nucleus, a nuclear ring and offset dust lanes. We identify $170$ spatially- and spectrally-resolved giant molecular clouds (GMCs). The… ▽ More We present high spatial resolution ($\approx24$ pc) Atacama Large Millimeter/sub-millimeter Array $^{12}$CO(2-1) observations of the central region of the nearby barred spiral galaxy NGC 5806. NGC 5806 has a highly structured molecular gas distribution with a clear nucleus, a nuclear ring and offset dust lanes. We identify $170$ spatially- and spectrally-resolved giant molecular clouds (GMCs). These clouds have comparable sizes ($R_{\mathrm{c}}$) and larger gas masses, observed linewidths ($σ_{\mathrm{obs,los}}$) and gas mass surface densities than those of clouds in the Milky Way disc. The size -- linewidth relation of the clouds is one of the steepest reported so far ($σ_{\mathrm{obs,los}}\propto R_{\mathrm{c}}^{1.20}$), the clouds are on average only marginally bound (with a mean virial parameter $\langleα_{\mathrm{vir}}\rangle\approx2$), and high velocity dispersions are observed in the nuclear ring. These behaviours are likely due to bar-driven gas shocks and inflows along the offset dust lanes, and we infer an inflow velocity of $\approx120$ kms$^{-1}$ and a total molecular gas mass inflow rate of $\approx5$ M$_\odot$ yr$^{-1}$ into the nuclear ring. The observed internal velocity gradients of the clouds are consistent with internal turbulence. The number of clouds in the nuclear ring decreases with azimuthal angle downstream from the dust lanes without clear variation of cloud properties. This is likely due to the estimated short lifetime of the clouds ($\approx6$ Myr), which appears to be mainly regulated by cloud-cloud collision and/or shear processes. Overall, it thus seems that the presence of the large-scale bar and gas inflows to the centre of NGC 5806 affect cloud properties. △ Less

Submitted 21 April, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: Accepted for publication in MNRAS, 20 pages, 16 figures

arXiv:2304.06117 [pdf, other]

doi 10.1093/mnras/stad1119

WISDOM project -- XIV. SMBH mass in the early-type galaxies NGC0612, NGC1574, and NGC4261 from CO dynamical modelling

Authors: Ilaria Ruffa, Timothy A. Davis, Michele Cappellari, Martin Bureau, Jacob S. Elford, Satoru Iguchi, Federico Lelli, Fu-Heng Liang, Lijie Liu, Anan Lu, Marc Sarzi, Thomas G. Williams

Abstract: We present a CO dynamical estimate of the mass of the super-massive black hole (SMBH) in three nearby early-type galaxies: NGC0612, NGC1574 and NGC4261. Our analysis is based on Atacama Large Millimeter/submillimeter Array (ALMA) Cycle 3-6 observations of the $^{12}$CO(2-1) emission line with spatial resolutions of $14-58$ pc ($0.01"-0.26"$). We detect disc-like CO distributions on scales from… ▽ More We present a CO dynamical estimate of the mass of the super-massive black hole (SMBH) in three nearby early-type galaxies: NGC0612, NGC1574 and NGC4261. Our analysis is based on Atacama Large Millimeter/submillimeter Array (ALMA) Cycle 3-6 observations of the $^{12}$CO(2-1) emission line with spatial resolutions of $14-58$ pc ($0.01"-0.26"$). We detect disc-like CO distributions on scales from $\lesssim200$ pc (NGC1574 and NGC4261) to $\approx10$ kpc (NGC0612). In NGC0612 and NGC1574 the bulk of the gas is regularly rotating. The data also provide evidence for the presence of a massive dark object at the centre of NGC1574, allowing us to obtain the first measure of its mass, $M_{\rm BH}=(1.0\pm0.2)\times10^{8}$ M$_{\odot}$ (1$σ$ uncertainty). In NGC4261, the CO kinematics is clearly dominated by the SMBH gravitational influence, allowing us to determine an accurate black hole mass of $(1.62{\pm 0.04})\times10^{9}$ M$_{\odot}$ ($1σ$ uncertainty). This is fully consistent with a previous CO dynamical estimate obtained using a different modelling technique. Signs of non-circular gas motions (likely outflow) are also identified in the inner regions of NGC4261. In NGC0612, we are only able to obtain a (conservative) upper limit of $M_{\rm BH}\lesssim3.2\times10^{9}$ M$_{\odot}$. This has likely to be ascribed to the presence of a central CO hole (with a radius much larger than that of the SMBH sphere of influence), combined with the inability of obtaining a robust prediction for the CO velocity curve. The three SMBH mass estimates are overall in agreement with predictions from the $M_{\rm BH}-σ_{\star}$ relation. △ Less

Submitted 6 November, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

Comments: Main text: 20 pages, 14 Figures; Appendix: 7 pages, 6 Figures. Accepted for publication in MNRAS on 2023 March 28

arXiv:2304.05934 [pdf, other]

ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition

Authors: Aashaka Desai, Lauren Berger, Fyodor O. Minakov, Vanessa Milan, Chinmay Singh, Kriston Pumphrey, Richard E. Ladner, Hal Daumé III, Alex X. Lu, Naomi Caselli, Danielle Bragg

Abstract: Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. However, most communication technologies operate in spoken and written languages, creating inequities in access. To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition (ISLR) dataset, collected with consent and containing 83,399 videos for 2,73… ▽ More Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. However, most communication technologies operate in spoken and written languages, creating inequities in access. To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition (ISLR) dataset, collected with consent and containing 83,399 videos for 2,731 distinct signs filmed by 52 signers in a variety of environments. We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their webcam to retrieve matching signs from a dictionary. We show that training supervised machine learning classifiers with our dataset advances the state-of-the-art on metrics relevant for dictionary retrieval, achieving 63% accuracy and a recall-at-10 of 91%, evaluated entirely on videos of users who are not present in the training or validation sets. An accessible PDF of this article is available at the following link: https://aashakadesai.github.io/research/ASLCitizen_arxiv_updated.pdf △ Less

Submitted 19 June, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

arXiv:2304.03879 [pdf, other]

GPT4Rec: A Generative Framework for Personalized Recommendation and User Interests Interpretation

Authors: Jinming Li, Wentao Zhang, Tian Wang, Guanglei Xiong, Alan Lu, Gerard Medioni

Abstract: Recent advancements in Natural Language Processing (NLP) have led to the development of NLP-based recommender systems that have shown superior performance. However, current models commonly treat items as mere IDs and adopt discriminative modeling, resulting in limitations of (1) fully leveraging the content information of items and the language modeling capabilities of NLP models; (2) interpreting… ▽ More Recent advancements in Natural Language Processing (NLP) have led to the development of NLP-based recommender systems that have shown superior performance. However, current models commonly treat items as mere IDs and adopt discriminative modeling, resulting in limitations of (1) fully leveraging the content information of items and the language modeling capabilities of NLP models; (2) interpreting user interests to improve relevance and diversity; and (3) adapting practical circumstances such as growing item inventories. To address these limitations, we present GPT4Rec, a novel and flexible generative framework inspired by search engines. It first generates hypothetical "search queries" given item titles in a user's history, and then retrieves items for recommendation by searching these queries. The framework overcomes previous limitations by learning both user and item embeddings in the language space. To well-capture user interests with different aspects and granularity for improving relevance and diversity, we propose a multi-query generation technique with beam search. The generated queries naturally serve as interpretable representations of user interests and can be searched to recommend cold-start items. With GPT-2 language model and BM25 search engine, our framework outperforms state-of-the-art methods by $75.7\%$ and $22.2\%$ in Recall@K on two public datasets. Experiments further revealed that multi-query generation with beam search improves both the diversity of retrieved items and the coverage of a user's multi-interests. The adaptiveness and interpretability of generated queries are discussed with qualitative case studies. △ Less

Submitted 7 April, 2023; originally announced April 2023.

arXiv:2304.00738 [pdf]

doi 10.23919/SISPAD57422.2023.10319583

Device Image-IV Mapping using Variational Autoencoder for Inverse Design and Forward Prediction

Authors: Thomas Lu, Albert Lu, Hiu Yung Wong

Abstract: This paper demonstrates the learning of the underlying device physics by mapping device structure images to their corresponding Current-Voltage (IV) characteristics using a novel framework based on variational autoencoders (VAE). Since VAE is used, domain expertise is not required and the framework can be quickly deployed on any new device and measurement. This is expected to be useful in the comp… ▽ More This paper demonstrates the learning of the underlying device physics by mapping device structure images to their corresponding Current-Voltage (IV) characteristics using a novel framework based on variational autoencoders (VAE). Since VAE is used, domain expertise is not required and the framework can be quickly deployed on any new device and measurement. This is expected to be useful in the compact modeling of novel devices when only device cross-sectional images and electrical characteristics are available (e.g. novel emerging memory). Technology Computer-Aided Design (TCAD) generated and hand-drawn Metal-Oxide-Semiconductor (MOS) device images and noisy drain-current-gate-voltage curves (IDVG) are used for the demonstration. The framework is formed by stacking two VAEs (one for image manifold learning and one for IDVG manifold learning) which communicate with each other through the latent variables. Five independent variables with different strengths are used. It is shown that it can perform inverse design (generate a design structure for a given IDVG) and forward prediction (predict IDVG for a given structure image, which can be used for compact modeling if the image is treated as device parameters) successfully. Since manifold learning is used, the machine is shown to be robust against noise in the inputs (i.e. using hand-drawn images and noisy IDVG curves) and not confused by weak and irrelevant independent variables. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 5 pages 6 figures

Journal ref: 2023 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD), Kobe, Japan, 2023, pp. 161-164

arXiv:2304.00201 [pdf, ps, other]

doi 10.1109/TSP.2024.3364914

Precoder Design for Massive MIMO Downlink with Matrix Manifold Optimization

Authors: Rui Sun, Chen Wang, An-An Lu, Xiqi Gao, Xiang-Gen Xia

Abstract: We investigate the weighted sum-rate (WSR) maximization linear precoder design for massive multiple-input multiple-output (MIMO) downlink. We consider a single-cell system with multiple users and propose a unified matrix manifold optimization framework applicable to total power constraint (TPC), per-user power constraint (PUPC) and per-antenna power constraint (PAPC). We prove that the precoders u… ▽ More We investigate the weighted sum-rate (WSR) maximization linear precoder design for massive multiple-input multiple-output (MIMO) downlink. We consider a single-cell system with multiple users and propose a unified matrix manifold optimization framework applicable to total power constraint (TPC), per-user power constraint (PUPC) and per-antenna power constraint (PAPC). We prove that the precoders under TPC, PUPC and PAPC are on distinct Riemannian submanifolds, and transform the constrained problems in Euclidean space to unconstrained ones on manifolds. In accordance with this, we derive Riemannian ingredients, including orthogonal projection, Riemannian gradient, Riemannian Hessian, retraction and vector transport, which are needed for precoder design in the matrix manifold framework. Then, Riemannian design methods using Riemannian steepest descent, Riemannian conjugate gradient and Riemannian trust region are provided to design the WSR-maximization precoders under TPC, PUPC or PAPC. Riemannian methods do not involve the inverses of the large dimensional matrices during the iterations, reducing the computational complexities of the algorithms. Complexity analyses and performance simulations demonstrate the advantages of the proposed precoder design. △ Less

Submitted 10 April, 2024; v1 submitted 31 March, 2023; originally announced April 2023.

Comments: 16 pages, 11 figures, journal

Journal ref: IEEE Transactions on Signal Processing, vol. 72, pp. 1065-1080, 2024

arXiv:2303.08684 [pdf, other]

doi 10.1063/5.0151038

Synchronization transitions in Kuramoto networks with higher-mode interaction

Authors: Rico Berner, Annie Lu, Igor M. Sokolov

Abstract: Synchronization is an omnipresent collective phenomenon in nature and technology, whose understanding is in particular for real-world systems still elusive. We study the synchronization transition in a phase oscillator system with two nonvanishing Fourier-modes in the interaction function and hence going beyond the Kuromoto paradigm. We show that the transition scenarios crucially depend on the in… ▽ More Synchronization is an omnipresent collective phenomenon in nature and technology, whose understanding is in particular for real-world systems still elusive. We study the synchronization transition in a phase oscillator system with two nonvanishing Fourier-modes in the interaction function and hence going beyond the Kuromoto paradigm. We show that the transition scenarios crucially depend on the interplay of the two coupling-modes. We describe the multistability induced by the presence of a second coupling-mode. By extending the collective coordinate approach, we describe the emergence of various states observed in the transition from incoherence to coherence. Remarkably, our analysis suggests that in essence the two-mode coupling gives rise to states that are characterized by two independent but interacting groups of oscillators. We believe that these findings will stimulate future research on dynamical systems including complex interaction functions beyond the Kuramoto-type. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2303.02241 [pdf, other]

Domain adaptation using optimal transport for invariant learning using histopathology datasets

Authors: Kianoush Falahkheirkhah, Alex Lu, David Alvarez-Melis, Grace Huynh

Abstract: Histopathology is critical for the diagnosis of many diseases, including cancer. These protocols typically require pathologists to manually evaluate slides under a microscope, which is time-consuming and subjective, leading to interest in machine learning to automate analysis. However, computational techniques are limited by batch effects, where technical factors like differences in preparation pr… ▽ More Histopathology is critical for the diagnosis of many diseases, including cancer. These protocols typically require pathologists to manually evaluate slides under a microscope, which is time-consuming and subjective, leading to interest in machine learning to automate analysis. However, computational techniques are limited by batch effects, where technical factors like differences in preparation protocol or scanners can alter the appearance of slides, causing models trained on one institution to fail when generalizing to others. Here, we propose a domain adaptation method that improves the generalization of histopathological models to data from unseen institutions, without the need for labels or retraining in these new settings. Our approach introduces an optimal transport (OT) loss, that extends adversarial methods that penalize models if images from different institutions can be distinguished in their representation space. Unlike previous methods, which operate on single samples, our loss accounts for distributional differences between batches of images. We show that on the Camelyon17 dataset, while both methods can adapt to global differences in color distribution, only our OT loss can reliably classify a cancer phenotype unseen during training. Together, our results suggest that OT improves generalization on rare but critical phenotypes that may only make up a small fraction of the total tiles and variation in a slide. △ Less

Submitted 3 March, 2023; originally announced March 2023.

arXiv:2302.09185 [pdf, other]

Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints

Authors: Albert Lu, Hongxin Zhang, Yanzhe Zhang, Xuezhi Wang, Diyi Yang

Abstract: The limits of open-ended generative models are unclear, yet increasingly important. What causes them to succeed and what causes them to fail? In this paper, we take a prompt-centric approach to analyzing and bounding the abilities of open-ended generative models. We present a generic methodology of analysis with two challenging prompt constraint types: structural and stylistic. These constraint ty… ▽ More The limits of open-ended generative models are unclear, yet increasingly important. What causes them to succeed and what causes them to fail? In this paper, we take a prompt-centric approach to analyzing and bounding the abilities of open-ended generative models. We present a generic methodology of analysis with two challenging prompt constraint types: structural and stylistic. These constraint types are categorized into a set of well-defined constraints that are analyzable by a single prompt. We then systematically create a diverse set of simple, natural, and useful prompts to robustly analyze each individual constraint. Using the GPT-3 text-davinci-002 model as a case study, we generate outputs from our collection of prompts and analyze the model's generative failures. We also show the generalizability of our proposed method on other large models like BLOOM and OPT. Our results and our in-context mitigation strategies reveal open challenges for future research. We have publicly released our code at https://github.com/SALT-NLP/Bound-Cap-LLM. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: 27 pages, 13 figures, 11 tables, to be published in EACL 2023 Findings

arXiv:2302.05989 [pdf]

Van der Waals device integration beyond the limits of van der Waals forces via adhesive matrix transfer

Authors: Peter F. Satterthwaite, Weikun Zhu, Patricia Jastrzebska-Perfect, Melbourne Tang, Hongze Gao, Hikari Kitadai, Ang-Yu Lu, Qishuo Tan, Shin-Yi Tang, Yu-Lun Chueh, Chia-Nung Kuo, Chin Shan Lue, Jing Kong, Xi Ling, Farnaz Niroui

Abstract: Pristine van der Waals (vdW) interfaces between two-dimensional (2D) and other materials are core to emerging optical and electronic devices. Their direct fabrication is, however, challenged as the vdW forces are weak and cannot be tuned to accommodate integration of arbitrary layers without solvents, sacrificial-layers or high-temperatures, steps that can introduce damage. To address these limita… ▽ More Pristine van der Waals (vdW) interfaces between two-dimensional (2D) and other materials are core to emerging optical and electronic devices. Their direct fabrication is, however, challenged as the vdW forces are weak and cannot be tuned to accommodate integration of arbitrary layers without solvents, sacrificial-layers or high-temperatures, steps that can introduce damage. To address these limitations, we introduce a single-step 2D material-to-device integration approach in which forces promoting transfer are decoupled from the vdW forces at the interface of interest. We use this adhesive matrix transfer to demonstrate conventionally-forbidden direct integration of diverse 2D materials (MoS2, WSe2, PtS2, GaS) with dielectrics (SiO2, Al2O3), and scalable, aligned heterostructure formation, both foundational to device development. We then demonstrate a single-step integration of monolayer-MoS2 into arrays of transistors. With no exposure to polymers or solvents, clean interfaces and pristine surfaces are preserved, which can be further engineered to demonstrate both n- and p-type behavior. Beyond serving as a platform to probe the intrinsic properties of sensitive nanomaterials without the influence of processing steps, our technique allows efficient formation of unconventional device form-factors, with an example of flexible transistors demonstrated. △ Less

Submitted 12 February, 2023; originally announced February 2023.

arXiv:2301.03410 [pdf, other]

In Defense of Structural Symbolic Representation for Video Event-Relation Prediction

Authors: Andrew Lu, Xudong Lin, Yulei Niu, Shih-Fu Chang

Abstract: Understanding event relationships in videos requires a model to understand the underlying structures of events (i.e. the event type, the associated argument roles, and corresponding entities) and factual knowledge for reasoning. Structural symbolic representation (SSR) based methods directly take event types and associated argument roles/entities as inputs to perform reasoning. However, the state-… ▽ More Understanding event relationships in videos requires a model to understand the underlying structures of events (i.e. the event type, the associated argument roles, and corresponding entities) and factual knowledge for reasoning. Structural symbolic representation (SSR) based methods directly take event types and associated argument roles/entities as inputs to perform reasoning. However, the state-of-the-art video event-relation prediction system shows the necessity of using continuous feature vectors from input videos; existing methods based solely on SSR inputs fail completely, even when given oracle event types and argument roles. In this paper, we conduct an extensive empirical analysis to answer the following questions: 1) why SSR-based method failed; 2) how to understand the evaluation setting of video event relation prediction properly; 3) how to uncover the potential of SSR-based methods. We first identify suboptimal training settings as causing the failure of previous SSR-based video event prediction models. Then through qualitative and quantitative analysis, we show how evaluation that takes only video as inputs is currently unfeasible, as well as the reliance on oracle event information to obtain an accurate evaluation. Based on these findings, we propose to further contextualize the SSR-based model to an Event-Sequence Model and equip it with more factual knowledge through a simple yet effective way of reformulating external visual commonsense knowledge bases into an event-relation prediction pretraining dataset. The resultant new state-of-the-art model eventually establishes a 25% Macro-accuracy performance boost. △ Less

Submitted 12 April, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: CVPRW 23, Learning with Limited Labelled Data

arXiv:2212.14156 [pdf, other]

Decentralized Voltage Control with Peer-to-peer Energy Trading in a Distribution Network

Authors: Chen Feng, Andrew L. Lu, Yihsu Chen

Abstract: Utilizing distributed renewable and energy storage resources via peer-to-peer (P2P) energy trading has long been touted as a solution to improve energy system's resilience and sustainability. Consumers and prosumers (those who have energy generation resources), however, do not have expertise to engage in repeated P2P trading, and the zero-marginal costs of renewables present challenges in determin… ▽ More Utilizing distributed renewable and energy storage resources via peer-to-peer (P2P) energy trading has long been touted as a solution to improve energy system's resilience and sustainability. Consumers and prosumers (those who have energy generation resources), however, do not have expertise to engage in repeated P2P trading, and the zero-marginal costs of renewables present challenges in determining fair market prices. To address these issues, we propose a multi-agent reinforcement learning (MARL) framework to help automate consumers' bidding and management of their solar PV and energy storage resources, under a specific P2P clearing mechanism that utilizes the so-called supply-demand ratio. In addition, we show how the MARL framework can integrate physical network constraints to realize decentralized voltage control, hence ensuring physical feasibility of the P2P energy trading and paving ways for real-world implementations. △ Less

Submitted 28 December, 2022; originally announced December 2022.

Showing 1–50 of 135 results for author: Lu, A