Search | arXiv e-print repository

arXiv:2407.20181 [pdf, other]

Blockchain for Large Language Model Security and Safety: A Holistic Survey

Authors: Caleb Geren, Amanda Board, Gaby G. Dagher, Tim Andersen, Jun Zhuang

Abstract: With the advent of accessible interfaces for interacting with large language models, there has been an associated explosion in both their commercial and academic interest. Consequently, there has also been an sudden burst of novel attacks associated with large language models, jeopardizing user data on a massive scale. Situated at a comparable crossroads in its development, and equally prolific to… ▽ More With the advent of accessible interfaces for interacting with large language models, there has been an associated explosion in both their commercial and academic interest. Consequently, there has also been an sudden burst of novel attacks associated with large language models, jeopardizing user data on a massive scale. Situated at a comparable crossroads in its development, and equally prolific to LLMs in its rampant growth, blockchain has emerged in recent years as a disruptive technology with the potential to redefine how we approach data handling. In particular, and due to its strong guarantees about data immutability and irrefutability as well as inherent data provenance assurances, blockchain has attracted significant attention as a means to better defend against the array of attacks affecting LLMs and further improve the quality of their responses. In this survey, we holistically evaluate current research on how blockchains are being used to help protect against LLM vulnerabilities, as well as analyze how they may further be used in novel applications. To better serve these ends, we introduce a taxonomy of blockchain for large language models (BC4LLM) and also develop various definitions to precisely capture the nature of different bodies of research in these areas. Moreover, throughout the paper, we present frameworks to contextualize broader research efforts, and in order to motivate the field further, we identify future research goals as well as challenges present in the blockchain for large language model (BC4LLM) space. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: Submitted to SIGKDD Explorations

arXiv:2407.19375 [pdf]

Topological Phase Transition in Quasi-One-Dimensional Bismuth Iodide Bi4I4

Authors: W. X. Zhao, M. Yang, X. Du, Y. D. Li, K. Y. Zhai, Y. Q. Hu, J. F. Han, Y. Huang, Z. K. Liu, Y. G. Yao, J. C. Zhuang, Y. Du, J. J. Zhou, Y. L. Chen, L. X. Yang

Abstract: The exploration of topological quantum materials and topological phase transitions is at the forefront of modern condensed matter physics. Quasi-one-dimensional (quasi-1D) bismuth iodide Bi4I4 exhibits versatile topological phases of matter including weak topological insulator (WTI) and higher-order topological insulator (HOTI) phases with high tunability in response to external parameters. In thi… ▽ More The exploration of topological quantum materials and topological phase transitions is at the forefront of modern condensed matter physics. Quasi-one-dimensional (quasi-1D) bismuth iodide Bi4I4 exhibits versatile topological phases of matter including weak topological insulator (WTI) and higher-order topological insulator (HOTI) phases with high tunability in response to external parameters. In this work, performing laser-based angle-resolved photoemission spectroscopy with submicron spatial resolution (micro-ARPES), we comprehensively investigate the fine electronic structure and topological phase transition of Bi4I4. Our examination of the low-temperature α-phase reveals the presence of an energy gap on the (100) surface, providing spectroscopic evidence for the HOTI phase. Conversely, the high-temperature β-Bi4I4 harbors a gapless Dirac fermion on the (100) surface alongside gapped states on the (001) surface, thereby establishing a WTI phase. By tracking the temperature evolution of the (100) surface states, we unveil a thermal hysteresis of the surface gap in line with the α-β structural phase transition. Our findings elucidate the topological properties of Bi4I4 and directly evidence a temperature-induced topological phase transition from WTI to HOTI, which paves the way to potential applications based on the room-temperature topological phase transition in the quasi-1D topological quantum material. △ Less

Submitted 27 July, 2024; originally announced July 2024.

arXiv:2407.17706 [pdf, other]

Investigating and Mitigating Barren Plateaus in Variational Quantum Circuits: A Survey

Authors: Jack Cunningham, Jun Zhuang

Abstract: In recent years, variational quantum circuits (VQCs) have been widely explored to advance quantum circuits against classic models on various domains, such as quantum chemistry and quantum machine learning. Similar to classic machine-learning models, VQCs can be optimized through gradient-based approaches. However, the gradient variance of VQCs may dramatically vanish as the number of qubits or lay… ▽ More In recent years, variational quantum circuits (VQCs) have been widely explored to advance quantum circuits against classic models on various domains, such as quantum chemistry and quantum machine learning. Similar to classic machine-learning models, VQCs can be optimized through gradient-based approaches. However, the gradient variance of VQCs may dramatically vanish as the number of qubits or layers increases. This issue, a.k.a. Barren Plateaus (BPs), seriously hinders the scaling of VQCs on large datasets. To mitigate the exponential gradient vanishing, extensive efforts have been devoted to tackling this issue through diverse strategies. In this survey, we conduct a systematic literature review of recent works from both investigation and mitigation perspectives. Besides, we propose a new taxonomy to categorize most existing mitigation strategies. At last, we provide insightful discussion for future directions of BPs. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: preprint, under review. Please feel free to reach out if your work fits within our scope

arXiv:2407.15613 [pdf, other]

Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

Authors: Xiangyan Qu, Jing Yu, Keke Gai, Jiamin Zhuang, Yuanmin Tang, Gang Xiong, Gaopeng Gou, Qi Wu

Abstract: Recent work shows that documents from encyclopedias serve as helpful auxiliary information for zero-shot learning. Existing methods align the entire semantics of a document with corresponding images to transfer knowledge. However, they disregard that semantic information is not equivalent between them, resulting in a suboptimal alignment. In this work, we propose a novel network to extract multi-v… ▽ More Recent work shows that documents from encyclopedias serve as helpful auxiliary information for zero-shot learning. Existing methods align the entire semantics of a document with corresponding images to transfer knowledge. However, they disregard that semantic information is not equivalent between them, resulting in a suboptimal alignment. In this work, we propose a novel network to extract multi-view semantic concepts from documents and images and align the matching rather than entire concepts. Specifically, we propose a semantic decomposition module to generate multi-view semantic embeddings from visual and textual sides, providing the basic concepts for partial alignment. To alleviate the issue of information redundancy among embeddings, we propose the local-to-semantic variance loss to capture distinct local details and multiple semantic diversity loss to enforce orthogonality among embeddings. Subsequently, two losses are introduced to partially align visual-semantic embedding pairs according to their semantic relevance at the view and word-to-patch levels. Consequently, we consistently outperform state-of-the-art methods under two document sources in three standard benchmarks for document-based zero-shot learning. Qualitatively, we show that our model learns the interpretable partial association. △ Less

Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: Accepted to ACM International Conference on Multimedia (MM) 2024

arXiv:2407.13139 [pdf, other]

Image Inpainting Models are Effective Tools for Instruction-guided Image Editing

Authors: Xuan Ju, Junhao Zhuang, Zhaoyang Zhang, Yuxuan Bian, Qiang Xu, Ying Shan

Abstract: This is the technique report for the winning solution of the CVPR2024 GenAI Media Generation Challenge Workshop's Instruction-guided Image Editing track. Instruction-guided image editing has been largely studied in recent years. The most advanced methods, such as SmartEdit and MGIE, usually combine large language models with diffusion models through joint training, where the former provides text u… ▽ More This is the technique report for the winning solution of the CVPR2024 GenAI Media Generation Challenge Workshop's Instruction-guided Image Editing track. Instruction-guided image editing has been largely studied in recent years. The most advanced methods, such as SmartEdit and MGIE, usually combine large language models with diffusion models through joint training, where the former provides text understanding ability, and the latter provides image generation ability. However, in our experiments, we find that simply connecting large language models and image generation models through intermediary guidance such as masks instead of joint fine-tuning leads to a better editing performance and success rate. We use a 4-step process IIIE (Inpainting-based Instruction-guided Image Editing): editing category classification, main editing object identification, editing mask acquisition, and image inpainting. Results show that through proper combinations of language models and image inpainting models, our pipeline can reach a high success rate with satisfying visual quality. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.12940 [pdf, other]

KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation

Authors: Jianbo Zhao, Jiaheng Zhuang, Qibin Zhou, Taiyu Ban, Ziyao Xu, Hangning Zhou, Junhe Wang, Guoan Wang, Zhiheng Li, Bin Li

Abstract: Trajectory generation is a pivotal task in autonomous driving. Recent studies have introduced the autoregressive paradigm, leveraging the state transition model to approximate future trajectory distributions. This paradigm closely mirrors the real-world trajectory generation process and has achieved notable success. However, its potential is limited by the ineffective representation of realistic t… ▽ More Trajectory generation is a pivotal task in autonomous driving. Recent studies have introduced the autoregressive paradigm, leveraging the state transition model to approximate future trajectory distributions. This paradigm closely mirrors the real-world trajectory generation process and has achieved notable success. However, its potential is limited by the ineffective representation of realistic trajectories within the redundant state space. To address this limitation, we propose the Kinematic-Driven Generative Model for Realistic Agent Simulation (KiGRAS). Instead of modeling in the state space, KiGRAS factorizes the driving scene into action probability distributions at each time step, providing a compact space to represent realistic driving patterns. By establishing physical causality from actions (cause) to trajectories (effect) through the kinematic model, KiGRAS eliminates massive redundant trajectories. All states derived from actions in the cause space are constrained to be physically feasible. Furthermore, redundant trajectories representing identical action sequences are mapped to the same representation, reflecting their underlying actions. This approach significantly reduces task complexity and ensures physical feasibility. KiGRAS achieves state-of-the-art performance in Waymo's SimAgents Challenge, ranking first on the WOMD leaderboard with significantly fewer parameters than other models. The video documentation is available at \url{https://kigras-mach.github.io/KiGRAS/}. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.08839 [pdf, other]

A Survey on the Application of Generative Adversarial Networks in Cybersecurity: Prospective, Direction and Open Research Scopes

Authors: Md Mashrur Arifin, Md Shoaib Ahmed, Tanmai Kumar Ghosh, Jun Zhuang, Jyh-haw Yeh

Abstract: With the proliferation of Artificial Intelligence, there has been a massive increase in the amount of data required to be accumulated and disseminated digitally. As the data are available online in digital landscapes with complex and sophisticated infrastructures, it is crucial to implement various defense mechanisms based on cybersecurity. Generative Adversarial Networks (GANs), which are deep le… ▽ More With the proliferation of Artificial Intelligence, there has been a massive increase in the amount of data required to be accumulated and disseminated digitally. As the data are available online in digital landscapes with complex and sophisticated infrastructures, it is crucial to implement various defense mechanisms based on cybersecurity. Generative Adversarial Networks (GANs), which are deep learning models, have emerged as powerful solutions for addressing the constantly changing security issues. This survey studies the significance of the deep learning model, precisely on GANs, in strengthening cybersecurity defenses. Our survey aims to explore the various works completed in GANs, such as Intrusion Detection Systems (IDS), Mobile and Network Trespass, BotNet Detection, and Malware Detection. The focus is to examine how GANs can be influential tools to strengthen cybersecurity defenses in these domains. Further, the paper discusses the challenges and constraints of using GANs in these areas and suggests future research directions. Overall, the paper highlights the potential of GANs in enhancing cybersecurity measures and addresses the need for further exploration in this field. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.05578 [pdf, other]

FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance

Authors: Jiedong Zhuang, Jiaqi Hu, Lianrui Mu, Rui Hu, Xiaoyu Liang, Jiangnan Ye, Haoji Hu

Abstract: CLIP has achieved impressive zero-shot performance after pre-training on a large-scale dataset consisting of paired image-text data. Previous works have utilized CLIP by incorporating manually designed visual prompts like colored circles and blur masks into the images to guide the model's attention, showing enhanced zero-shot performance in downstream tasks. Although these methods have achieved pr… ▽ More CLIP has achieved impressive zero-shot performance after pre-training on a large-scale dataset consisting of paired image-text data. Previous works have utilized CLIP by incorporating manually designed visual prompts like colored circles and blur masks into the images to guide the model's attention, showing enhanced zero-shot performance in downstream tasks. Although these methods have achieved promising results, they inevitably alter the original information of the images, which can lead to failure in specific tasks. We propose a train-free method Foveal-Attention CLIP (FALIP), which adjusts the CLIP's attention by inserting foveal attention masks into the multi-head self-attention module. We demonstrate FALIP effectively boosts CLIP zero-shot performance in tasks such as referring expressions comprehension, image classification, and 3D point cloud recognition. Experimental results further show that FALIP outperforms existing methods on most metrics and can augment current methods to enhance their performance. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: accepted by ECCV2024

arXiv:2407.04064 [pdf, other]

Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

Authors: Jiafan Zhuang, Zihao Xia, Gaofei Han, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

Abstract: Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie,… ▽ More Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie, causal representation disentanglement, which can identify the causal and non-causal factors in representations. After that, we only pass causal factors for subsequent policy learning and thus explicitly eliminate the influence of non-causal factors, which effectively improves the generalization ability of DRL models. Experimental results show that our proposed method can achieve robust navigation performance and effective collision avoidance especially in unseen scenarios, which significantly outperforms existing SOTA algorithms. △ Less

Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.04056 [pdf, other]

Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection

Authors: Jiafan Zhuang, Gaofei Han, Zihao Xia, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

Abstract: In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in… ▽ More In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in performance degradation in unseen environments. To address this issue, we investigate the cause of weak generalization ability in DRL and propose a novel causal feature selection module. This module can be integrated into the policy network and effectively filters out non-causal factors in representations, thereby reducing the influence of spurious correlations between non-causal factors and action predictions. Experimental results demonstrate that our proposed method can achieve robust navigation performance and effective collision avoidance especially in scenarios with unseen backgrounds and obstacles, which significantly outperforms existing state-of-the-art algorithms. △ Less

Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03116 [pdf, other]

Hardware-efficient variational quantum algorithm in trapped-ion quantum computer

Authors: J. -Z. Zhuang, Y. -K. Wu, L. -M. Duan

Abstract: We study a hardware-efficient variational quantum algorithm ansatz tailored for the trapped-ion quantum simulator, HEA-TI. We leverage programmable single-qubit rotations and global spin-spin interactions among all ions, reducing the dependence on resource-intensive two-qubit gates in conventional gate-based methods. We apply HEA-TI to state engineering of cluster states and analyze the scaling of… ▽ More We study a hardware-efficient variational quantum algorithm ansatz tailored for the trapped-ion quantum simulator, HEA-TI. We leverage programmable single-qubit rotations and global spin-spin interactions among all ions, reducing the dependence on resource-intensive two-qubit gates in conventional gate-based methods. We apply HEA-TI to state engineering of cluster states and analyze the scaling of required quantum resources. We also apply HEA-TI to solve the ground state problem of chemical molecules $\mathrm{H_{2}}$, $\mathrm{LiH}$ and $\mathrm{F_{2}}$. We numerically analyze the quantum computing resources required to achieve chemical accuracy and examine the performance under realistic experimental noise and statistical fluctuation. The efficiency of this ansatz is shown to be comparable to other commonly used variational ansatzes like UCCSD, with the advantage of substantially easier implementation in the trapped-ion quantum simulator. This approach showcases the hardware-efficient ansatz as a powerful tool for the application of the near-term quantum computer. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.01599 [pdf, other]

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Authors: Haibo Jin, Leyang Hu, Xinuo Li, Peiyan Zhang, Chonghan Chen, Jun Zhuang, Haohan Wang

Abstract: The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignm… ▽ More The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignment. This survey provides an extensive review of the emerging field of jailbreaking--deliberately circumventing the ethical and operational boundaries of LLMs and VLMs--and the consequent development of defense mechanisms. Our study categorizes jailbreaks into seven distinct types and elaborates on defense strategies that address these vulnerabilities. Through this comprehensive examination, we identify research gaps and propose directions for future studies to enhance the security frameworks of LLMs and VLMs. Our findings underscore the necessity for a unified perspective that integrates both jailbreak strategies and defensive solutions to foster a robust, secure, and reliable environment for the next generation of language models. More details can be found on our website: \url{https://chonghan-chen.com/llm-jailbreak-zoo-survey/}. △ Less

Submitted 24 July, 2024; v1 submitted 25 June, 2024; originally announced July 2024.

Comments: 45 pages

arXiv:2406.19844 [pdf, other]

StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction

Authors: Jiaheng Zhuang, Guoan Wang, Siyu Zhang, Xiyang Wang, Hangning Zhou, Ziyao Xu, Chi Zhang, Zhiheng Li

Abstract: 3D multi-object tracking and trajectory prediction are two crucial modules in autonomous driving systems. Generally, the two tasks are handled separately in traditional paradigms and a few methods have started to explore modeling these two tasks in a joint manner recently. However, these approaches suffer from the limitations of single-frame training and inconsistent coordinate representations bet… ▽ More 3D multi-object tracking and trajectory prediction are two crucial modules in autonomous driving systems. Generally, the two tasks are handled separately in traditional paradigms and a few methods have started to explore modeling these two tasks in a joint manner recently. However, these approaches suffer from the limitations of single-frame training and inconsistent coordinate representations between tracking and prediction tasks. In this paper, we propose a streaming and unified framework for joint 3D Multi-Object Tracking and trajectory Prediction (StreamMOTP) to address the above challenges. Firstly, we construct the model in a streaming manner and exploit a memory bank to preserve and leverage the long-term latent features for tracked objects more effectively. Secondly, a relative spatio-temporal positional encoding strategy is introduced to bridge the gap of coordinate representations between the two tasks and maintain the pose-invariance for trajectory prediction. Thirdly, we further improve the quality and consistency of predicted trajectories with a dual-stream predictor. We conduct extensive experiments on popular nuSences dataset and the experimental results demonstrate the effectiveness and superiority of StreamMOTP, which outperforms previous methods significantly on both tasks. Furthermore, we also prove that the proposed framework has great potential and advantages in actual applications of autonomous driving. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.15054 [pdf]

doi 10.1021/acsami.4c02078

Dynamic Response of Ionic Current in Conical Nanopores

Authors: Zhe Liu, Long Ma, Hongwen Zhang, Jiakun Zhuang, Jia Man, Zuzanna S. Siwy, Yinghua Qiu

Abstract: Ionic current rectification (ICR) of charged conical nanopores has various applications in fields including nanofluidics, bio-sensing, and energy conversion, whose function is closely related to the dynamic response of nanopores. The occurrence of ICR originates from the ion enrichment and depletion in conical pores, whose formation is found to be affected by the scanning rate of voltages. Here, t… ▽ More Ionic current rectification (ICR) of charged conical nanopores has various applications in fields including nanofluidics, bio-sensing, and energy conversion, whose function is closely related to the dynamic response of nanopores. The occurrence of ICR originates from the ion enrichment and depletion in conical pores, whose formation is found to be affected by the scanning rate of voltages. Here, through time-dependent simulations, we investigate the variation of ion current under electric fields and the dynamic formation of ion enrichment and depletion, which can reflect the response time of conical nanopores. The response time of nanopores when ion enrichment forms i.e. at the on state is significantly longer than that with the formation of ion depletion i.e. at the off state. Our simulation results reveal the regulation of response time by different nanopore parameters including the surface charge density, pore length, tip, and base radius, as well as the applied conditions such as the voltage and bulk concentration. The response time of nanopores is closely related to the surface charge density, pore length, voltage, and bulk concentration. Our uncovered dynamic response mechanism of the ionic current can guide the design of nanofluidic devices with conical nanopores, including memristors, ionic switches, and rectifiers. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 30 pages, 5 figures

Journal ref: ACS Appl. Mater. Interfaces 2024, 16 (23), 30496-30505

arXiv:2406.09053 [pdf, ps, other]

Joint Channel Estimation and Prediction for Massive MIMO with Frequency Hopping Sounding

Authors: Yiming Zhu, Jiawei Zhuang, Gangle Sun, Hongwei Hou, Li You, Wenjin Wang

Abstract: In massive multiple-input multiple-output (MIMO) systems, the downlink transmission performance heavily relies on accurate channel state information (CSI). Constrained by the transmitted power, user equipment always transmits sounding reference signals (SRSs) to the base station through frequency hopping, which will be leveraged to estimate uplink CSI and subsequently predict downlink CSI. This pa… ▽ More In massive multiple-input multiple-output (MIMO) systems, the downlink transmission performance heavily relies on accurate channel state information (CSI). Constrained by the transmitted power, user equipment always transmits sounding reference signals (SRSs) to the base station through frequency hopping, which will be leveraged to estimate uplink CSI and subsequently predict downlink CSI. This paper aims to investigate joint channel estimation and prediction (JCEP) for massive MIMO with frequency hopping sounding (FHS). Specifically, we present a multiple-subband (MS) delay-angle-Doppler (DAD) domain channel model with off-grid basis to tackle the energy leakage problem. Furthermore, we formulate the JCEP problem with FHS as a multiple measurement vector (MMV) problem, facilitating the sharing of common CSI across different subbands. To solve this problem, we propose an efficient Off-Grid-MS hybrid message passing (HMP) algorithm under the constrained Bethe free energy (BFE) framework. Aiming to address the lack of prior CSI in practical scenarios, the proposed algorithm can adaptively learn the hyper-parameters of the channel by minimizing the corresponding terms in the BFE expression. To alleviate the complexity of channel hyper-parameter learning, we leverage the approximations of the off-grid matrices to simplify the off-grid hyper-parameter estimation. Numerical results illustrate that the proposed algorithm can effectively mitigate the energy leakage issue and exploit the common CSI across different subbands, acquiring more accurate CSI compared to state-of-the-art counterparts. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2406.08012 [pdf, other]

Interaction of an outflow with surrounding gaseous clouds as the origin of the late-time radio flares in TDEs

Authors: Jialun Zhuang, Rong-Feng Shen, Guobin Mou, Wenbin Lu

Abstract: Close encounter between a star and a supermassive black hole (SMBH) results in the tidal disruption of the star, known as a tidal disruption event (TDE). Recently, a few TDEs, e.g., ASASSN-15oi and AT2018hyz, have shown late-time (hundreds of days after their UV/optical peaks) radio flares with radio luminosities of $10^{38\sim39}$ erg/s. The super-Eddington fallback or accretion in a TDE may gene… ▽ More Close encounter between a star and a supermassive black hole (SMBH) results in the tidal disruption of the star, known as a tidal disruption event (TDE). Recently, a few TDEs, e.g., ASASSN-15oi and AT2018hyz, have shown late-time (hundreds of days after their UV/optical peaks) radio flares with radio luminosities of $10^{38\sim39}$ erg/s. The super-Eddington fallback or accretion in a TDE may generate a mass outflow. Here we investigate a scenario that the late-time radio flares come from the interaction of the outflow with the circum-nuclear gaseous clouds, in addition to the slow-evolving emission component due to the outflow-diffuse medium interaction. We calculate the associated radio temporal and spectral signatures and find that they reproduce well the observations. The outflows have the inferred velocity of 0.2$\sim0.8$ c, the total mass of $10^{-3}\sim10^{-1}$ $\mathrm{M_{\odot}}$ and the ejection duration of a month to a year. The distances of the clouds to the SMBH are $0.1\sim1$ pc. This scenario has advantages in explaining the long delay, sharpness of the rise and the multiplicity of the late radio flares. Future observations may build up a much larger sample of late-time radio flares and enable their use as a probe of the TDE physics and the host circumnuclear environment. △ Less

Submitted 26 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 13 pages, 13 figures. Submitted to ApJ. A new version with some modifications. Comments are welcome

arXiv:2406.06959 [pdf, other]

Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems

Authors: Jiawei Zhang, Jiaxin Zhuang, Cheng Jin, Gen Li, Yuantao Gu

Abstract: The recent emergence of diffusion models has significantly advanced the precision of learnable priors, presenting innovative avenues for addressing inverse problems. Since inverse problems inherently entail maximum a posteriori estimation, previous works have endeavored to integrate diffusion priors into the optimization frameworks. However, prevailing optimization-based inverse algorithms primari… ▽ More The recent emergence of diffusion models has significantly advanced the precision of learnable priors, presenting innovative avenues for addressing inverse problems. Since inverse problems inherently entail maximum a posteriori estimation, previous works have endeavored to integrate diffusion priors into the optimization frameworks. However, prevailing optimization-based inverse algorithms primarily exploit the prior information within the diffusion models while neglecting their denoising capability. To bridge this gap, this work leverages the diffusion process to reframe noisy inverse problems as a two-variable constrained optimization task by introducing an auxiliary optimization variable. By employing gradient truncation, the projection gradient descent method is efficiently utilized to solve the corresponding optimization problem. The proposed algorithm, termed ProjDiff, effectively harnesses the prior information and the denoising capability of a pre-trained diffusion model within the optimization framework. Extensive experiments on the image restoration tasks and source separation and partial generation tasks demonstrate that ProjDiff exhibits superior performance across various linear and nonlinear inverse problems, highlighting its potential for practical applications. Code is available at https://github.com/weigerzan/ProjDiff/. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.05392 [pdf, other]

Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

Authors: Chengyuan Deng, Yiqun Duan, Xin Jin, Heng Chang, Yijun Tian, Han Liu, Henry Peng Zou, Yiqiao Jin, Yijia Xiao, Yichen Wang, Shenghao Wu, Zongxing Xie, Kuofeng Gao, Sihong He, Jun Zhuang, Lu Cheng, Haohan Wang

Abstract: Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, an… ▽ More Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, and data privacy, to emerging problems like truthfulness and social norms. We critically analyze existing research aimed at understanding, examining, and mitigating these ethical risks. Our survey underscores integrating ethical standards and societal values into the development of LLMs, thereby guiding the development of responsible and ethically aligned language models. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.03368 [pdf, other]

IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

Authors: David Ifeoluwa Adelani, Jessica Ojo, Israel Abebe Azime, Jian Yun Zhuang, Jesujoba O. Alabi, Xuanli He, Millicent Ochieng, Sara Hooker, Andiswa Bukula, En-Shiun Annie Lee, Chiamaka Chukwuneke, Happy Buzaaba, Blessing Sibanda, Godson Kalipe, Jonathan Mukiibi, Salomon Kabongo, Foutse Yuehgoh, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Shamsuddeen Hassan Muhammad, Salomey Osei, Sokhar Samb, Tadesse Kebede Guge , et al. (1 additional authors not shown)

Abstract: Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoB… ▽ More Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench -- a human-translated benchmark dataset for 16 typologically-diverse low-resource African languages covering three tasks: natural language inference~(AfriXNLI), mathematical reasoning~(AfriMGSM), and multi-choice knowledge-based QA~(AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings~(where test sets are translated into English) across 10 open and four proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages~(such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Aya-101 only at 58\% of the best-performing proprietary model GPT-4o performance. Machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, like LLaMa 3 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.03097 [pdf, other]

Enhancing the Resilience of Graph Neural Networks to Topological Perturbations in Sparse Graphs

Authors: Shuqi He, Jun Zhuang, Ding Wang, Luyao Peng, Jun Song

Abstract: Graph neural networks (GNNs) have been extensively employed in node classification. Nevertheless, recent studies indicate that GNNs are vulnerable to topological perturbations, such as adversarial attacks and edge disruptions. Considerable efforts have been devoted to mitigating these challenges. For example, pioneering Bayesian methodologies, including GraphSS and LlnDT, incorporate Bayesian labe… ▽ More Graph neural networks (GNNs) have been extensively employed in node classification. Nevertheless, recent studies indicate that GNNs are vulnerable to topological perturbations, such as adversarial attacks and edge disruptions. Considerable efforts have been devoted to mitigating these challenges. For example, pioneering Bayesian methodologies, including GraphSS and LlnDT, incorporate Bayesian label transitions and topology-based label sampling to strengthen the robustness of GNNs. However, GraphSS is hindered by slow convergence, while LlnDT faces challenges in sparse graphs. To overcome these limitations, we propose a novel label inference framework, TraTopo, which combines topology-driven label propagation, Bayesian label transitions, and link analysis via random walks. TraTopo significantly surpasses its predecessors on sparse graphs by utilizing random walk sampling, specifically targeting isolated nodes for link prediction, thus enhancing its effectiveness in topological sampling contexts. Additionally, TraTopo employs a shortest-path strategy to refine link prediction, thereby reducing predictive overhead and improving label inference accuracy. Empirical evaluations highlight TraTopo's superiority in node classification, significantly exceeding contemporary GCN models in accuracy. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.01264 [pdf, other]

FreeTumor: Advance Tumor Segmentation via Large-Scale Tumor Synthesis

Authors: Linshan Wu, Jiaxin Zhuang, Xuefeng Ni, Hao Chen

Abstract: AI-driven tumor analysis has garnered increasing attention in healthcare. However, its progress is significantly hindered by the lack of annotated tumor cases, which requires radiologists to invest a lot of effort in collecting and annotation. In this paper, we introduce a highly practical solution for robust tumor synthesis and segmentation, termed FreeTumor, which refers to annotation-free synth… ▽ More AI-driven tumor analysis has garnered increasing attention in healthcare. However, its progress is significantly hindered by the lack of annotated tumor cases, which requires radiologists to invest a lot of effort in collecting and annotation. In this paper, we introduce a highly practical solution for robust tumor synthesis and segmentation, termed FreeTumor, which refers to annotation-free synthetic tumors and our desire to free patients that suffering from tumors. Instead of pursuing sophisticated technical synthesis modules, we aim to design a simple yet effective tumor synthesis paradigm to unleash the power of large-scale data. Specifically, FreeTumor advances existing methods mainly from three aspects: (1) Existing methods only leverage small-scale labeled data for synthesis training, which limits their ability to generalize well on unseen data from different sources. To this end, we introduce the adversarial training strategy to leverage large-scale and diversified unlabeled data in synthesis training, significantly improving tumor synthesis. (2) Existing methods largely ignored the negative impact of low-quality synthetic tumors in segmentation training. Thus, we employ an adversarial-based discriminator to automatically filter out the low-quality synthetic tumors, which effectively alleviates their negative impact. (3) Existing methods only used hundreds of cases in tumor segmentation. In FreeTumor, we investigate the data scaling law in tumor segmentation by scaling up the dataset to 11k cases. Extensive experiments demonstrate the superiority of FreeTumor, e.g., on three tumor segmentation benchmarks, average $+8.9\%$ DSC over the baseline that only using real tumors and $+6.6\%$ DSC over the state-of-the-art tumor synthesis method. Code will be available. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2405.19590 [pdf, other]

Weights Augmentation: it has never ever ever ever let her model down

Authors: Junbin Zhuang, Guiguang Din, Yunyi Yan

Abstract: Weight play an essential role in deep learning network models. Unlike network structure design, this article proposes the concept of weight augmentation, focusing on weight exploration. The core of Weight Augmentation Strategy (WAS) is to adopt random transformed weight coefficients training and transformed coefficients, named Shadow Weight(SW), for networks that can be used to calculate loss func… ▽ More Weight play an essential role in deep learning network models. Unlike network structure design, this article proposes the concept of weight augmentation, focusing on weight exploration. The core of Weight Augmentation Strategy (WAS) is to adopt random transformed weight coefficients training and transformed coefficients, named Shadow Weight(SW), for networks that can be used to calculate loss function to affect parameter updates. However, stochastic gradient descent is applied to Plain Weight(PW), which is referred to as the original weight of the network before the random transformation. During training, numerous SW collectively form high-dimensional space, while PW is directly learned from the distribution of SW instead of the data. The weight of the accuracy-oriented mode(AOM) relies on PW, which guarantees the network is highly robust and accurate. The desire-oriented mode(DOM) weight uses SW, which is determined by the network model's unique functions based on WAT's performance desires, such as lower computational complexity, lower sensitivity to particular data, etc. The dual mode be switched at anytime if needed. WAT extends the augmentation technique from data augmentation to weight, and it is easy to understand and implement, but it can improve almost all networks amazingly. Our experimental results show that convolutional neural networks, such as VGG-16, ResNet-18, ResNet-34, GoogleNet, MobilementV2, and Efficientment-Lite, can benefit much at little or no cost. The accuracy of models is on the CIFAR100 and CIFAR10 datasets, which can be evaluated to increase by 7.32\% and 9.28\%, respectively, with the highest values being 13.42\% and 18.93\%, respectively. In addition, DOM can reduce floating point operations (FLOPs) by up to 36.33\%. The code is available at https://github.com/zlearh/Weight-Augmentation-Technology. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.11830 [pdf, other]

Fe2+ partitioning in Al-free pyrolite: consequences for seismic velocities and heterogeneities

Authors: Jingyi Zhuang, Renata Wentzcovitch

Abstract: Iron partitioning among the main lower mantle phases, bridgmanite (Bm) and ferropericlase (Fp), has non-monotonic behavior owing to the high-spin to low-spin crossover in ferrous iron (Fe2+) in Fp. Results of previous studies of the iron partitioning coefficient between these phases, $K_D$, still have considerable uncertainty. Here, we investigate the Fe2+ partitioning behavior using well-document… ▽ More Iron partitioning among the main lower mantle phases, bridgmanite (Bm) and ferropericlase (Fp), has non-monotonic behavior owing to the high-spin to low-spin crossover in ferrous iron (Fe2+) in Fp. Results of previous studies of the iron partitioning coefficient between these phases, $K_D$, still have considerable uncertainty. Here, we investigate the Fe2+ partitioning behavior using well-documented ab initio free energy results plus new updates. Although we focus on Fe2+ only, we describe the effect of this iron spin crossover (ISC) on $K_D$ and of the latter on compositions and seismic velocities in a pyrolitic aggregate. Our results suggest that its velocities are mainly affected by the ISC and less so by the Fe2+ partitioning. In contrast, iron partitioning manifests in thermally induced velocity heterogeneity ratios. Prediction of the seismological parameter $R_{S/P}$ ($\partial \ln V_S/\partial \ln V_P$) including iron partitioning effects resembles quantitatively $R_{S/P}$'s inferred from several tomographic studies down to 2,400 km depth. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 18 pages, 5 figures

arXiv:2405.01606 [pdf, other]

Improving Trainability of Variational Quantum Circuits via Regularization Strategies

Authors: Jun Zhuang, Jack Cunningham, Chaowen Guan

Abstract: In the era of noisy intermediate-scale quantum (NISQ), variational quantum circuits (VQCs) have been widely applied in various domains, advancing the superiority of quantum circuits against classic models. Similar to classic models, regular VQCs can be optimized by various gradient-based methods. However, the optimization may be initially trapped in barren plateaus or eventually entangled in saddl… ▽ More In the era of noisy intermediate-scale quantum (NISQ), variational quantum circuits (VQCs) have been widely applied in various domains, advancing the superiority of quantum circuits against classic models. Similar to classic models, regular VQCs can be optimized by various gradient-based methods. However, the optimization may be initially trapped in barren plateaus or eventually entangled in saddle points during training. These gradient issues can significantly undermine the trainability of VQC. In this work, we propose a strategy that regularizes model parameters with prior knowledge of the train data and Gaussian noise diffusion. We conduct ablation studies to verify the effectiveness of our strategy across four public datasets and demonstrate that our method can improve the trainability of VQCs against the above-mentioned gradient issues. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: preprint, under review. TL;DR: we propose a regularization strategy to improve the trainability of VQCs

arXiv:2404.16374 [pdf]

Revisiting Seismicity Criticality: A New Framework for Bias Correction of Statistical Seismology Model Calibrations

Authors: Jiawei Li, Didier Sornette, Zhongliang Wu, Jiancang Zhuang, Changsheng Jiang

Abstract: The Epidemic-Type Aftershock Sequences (ETAS) model and its variants effectively capture the space-time clustering of seismicity, setting the standard for earthquake forecasting. Accurate unbiased ETAS calibration is thus crucial. But we identify three sources of bias, (i) boundary effects, (ii) finite-size effects, and (iii) censorship, which are often overlooked or misinterpreted, causing errors… ▽ More The Epidemic-Type Aftershock Sequences (ETAS) model and its variants effectively capture the space-time clustering of seismicity, setting the standard for earthquake forecasting. Accurate unbiased ETAS calibration is thus crucial. But we identify three sources of bias, (i) boundary effects, (ii) finite-size effects, and (iii) censorship, which are often overlooked or misinterpreted, causing errors in seismic analysis and predictions. By employing an ETAS model variant with variable spatial background rates, we propose a method to correct for these biases, focusing on the branching ratio n, a key indicator of earthquake triggering potential. Our approach quantifies the variation in the apparent branching ratio (napp) with increased cut-off magnitude (Mco) above the optimal cut-off (Mcobest). The napp(Mco) function yields insights superior to traditional point estimates. We validate our method using synthetic earthquake catalogs, accurately recovering the true branching ratio (ntrue) after correcting biases with napp(Mco). Additionally, our method introduces a refined estimation of the minimum triggering magnitude (m0), a crucial parameter in the ETAS model. Applying our framework to the earthquake catalogs of California, New Zealand, and the China Seismic Experimental Site (CSES) in Sichuan and Yunnan provinces, we find that seismicity hovers away from the critical point, nc = 1, remaining distinctly subcritical, however with values tending to be larger than recent reports that do not consider the above biases. It is interesting that, m0 is found around 4 for California, 3 for New Zealand and 2 for CSES, suggesting that many small triggered earthquakes may not be fertile. Understanding seismicity's critical state significantly enhances our comprehension of seismic patterns, aftershock predictability, and informs earthquake risk mitigation and management strategies. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 36 pages, 7 figures, 5 tables

arXiv:2404.15760 [pdf, other]

Debiasing Machine Unlearning with Counterfactual Examples

Authors: Ziheng Chen, Jia Wang, Jun Zhuang, Abbavaram Gowtham Reddy, Fabrizio Silvestri, Jin Huang, Kaushiki Nag, Kun Kuang, Xin Ning, Gabriele Tolomei

Abstract: The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1… ▽ More The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1) data-level bias, characterized by uneven data removal, and (2) algorithm-level bias, which leads to the contamination of the remaining dataset, thereby degrading model accuracy. In this work, we analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. Typically, we introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Besides, we guide the forgetting procedure by leveraging counterfactual examples, as they maintain semantic data consistency without hurting performance on the remaining dataset. Experimental results demonstrate that our method outperforms existing machine unlearning baselines on evaluation metrics. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15580 [pdf, other]

MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis

Authors: Jiaxin Zhuang, Linshan Wu, Qiong Wang, Varut Vardhanabhuti, Lin Luo, Hao Chen

Abstract: The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Mask AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the per… ▽ More The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Mask AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the performance of downstream tasks. In this paper, we propose a novel \textit{Mask in Mask (MiM)} pre-training framework for 3D medical images, which aims to advance MAE by learning discriminative representation from hierarchical visual tokens across varying scales. We introduce multiple levels of granularity for masked inputs from the volume, which are then reconstructed simultaneously ranging at both fine and coarse levels. Additionally, a cross-level alignment mechanism is applied to adjacent level volumes to enforce anatomical similarity hierarchically. Furthermore, we adopt a hybrid backbone to enhance the hierarchical representation learning efficiently during the pre-training. MiM was pre-trained on a large scale of available 3D volumetric images, \textit{i.e.,} Computed Tomography (CT) images containing various body parts. Extensive experiments on thirteen public datasets demonstrate the superiority of MiM over other SSL methods in organ/lesion/tumor segmentation and disease classification. We further scale up the MiM to large pre-training datasets with more than 10k volumes, showing that large-scale pre-training can further enhance the performance of downstream tasks. The improvement also concluded that the research community should pay more attention to the scale of the pre-training dataset towards the healthcare foundation model for 3D medical images. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: submitted to journal

arXiv:2404.08852 [pdf, other]

Complex variable solution on over-/under-break shallow tunnelling in gravitational geomaterial with reasonable far-field displacement

Authors: Luo-bin Lin, Fu-quan Chen, Jin-ping Zhuang

Abstract: Over-/under-break excavation is a common phenomenon in shallow tunnelling, which is nonetheless not generally considered in existing complex variable solutions. In this paper, a new equilibrium mechanical model on over-/under-break shallow tunnelling in gravitational geomaterial is established by fixing far-field ground surface to form a corresponding mixed boundary problem. With integration of a… ▽ More Over-/under-break excavation is a common phenomenon in shallow tunnelling, which is nonetheless not generally considered in existing complex variable solutions. In this paper, a new equilibrium mechanical model on over-/under-break shallow tunnelling in gravitational geomaterial is established by fixing far-field ground surface to form a corresponding mixed boundary problem. With integration of a newly proposed bidirectional composite conformal mapping using Charge Simulation Method, a complex variable solution of infinite complex potential series is subsequently derived using analytic continuation to tranform the mixed boundaries into a homogenerous Riemann-Hilbert problem, which is iteratively solved to obtain the stress and displacement in geomaterial. The infinite complex potential series of the complex variable solution are truncated to obtain numerical results, which is rectified by Lanczos filtering to reduce the oscillation of Gibbs phenomena. The bidirectional conformal mapping is discussed and validated via several numerical cases, and the subsequent complex variable solution is verified by examining the Lanczos filtering and solution convergence, and comparing with corresponding finite element solution and existing analytical solution. Further discussions are made to disclose possible defects of the proposed solution for objectivity. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.06039 [pdf, other]

Breathing New Life into Existing Visualizations: A Natural Language-Driven Manipulation Framework

Authors: Can Liu, Jiacheng Yu, Yuhan Guo, Jiayi Zhuang, Yuchu Luo, Xiaoru Yuan

Abstract: We propose an approach to manipulate existing interactive visualizations to answer users' natural language queries. We analyze the natural language tasks and propose a design space of a hierarchical task structure, which allows for a systematic decomposition of complex queries. We introduce a four-level visualization manipulation space to facilitate in-situ manipulations for visualizations, enabli… ▽ More We propose an approach to manipulate existing interactive visualizations to answer users' natural language queries. We analyze the natural language tasks and propose a design space of a hierarchical task structure, which allows for a systematic decomposition of complex queries. We introduce a four-level visualization manipulation space to facilitate in-situ manipulations for visualizations, enabling a fine-grained control over the visualization elements. Our methods comprise two essential components: the natural language-to-task translator and the visualization manipulation parser. The natural language-to-task translator employs advanced NLP techniques to extract structured, hierarchical tasks from natural language queries, even those with varying degrees of ambiguity. The visualization manipulation parser leverages the hierarchical task structure to streamline these tasks into a sequence of atomic visualization manipulations. To illustrate the effectiveness of our approach, we provide real-world examples and experimental results. The evaluation highlights the precision of our natural language parsing capabilities and underscores the smooth transformation of visualization manipulations. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 21 pages

arXiv:2404.02065 [pdf, other]

doi 10.1109/TMM.2024.3374594

Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation

Authors: Hui Xiao, Yuting Hong, Li Dong, Diqun Yan, Jiayan Zhuang, Junjie Xiong, Dongtai Liang, Chengbin Peng

Abstract: Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled data. However, unreliable pseudo-labeling can undermine the semi-supervision processes. In this paper, we propose an algorithm called Multi-Level Label Correction (… ▽ More Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled data. However, unreliable pseudo-labeling can undermine the semi-supervision processes. In this paper, we propose an algorithm called Multi-Level Label Correction (MLLC), which aims to use graph neural networks to capture structural relationships in Semantic-Level Graphs (SLGs) and Class-Level Graphs (CLGs) to rectify erroneous pseudo-labels. Specifically, SLGs represent semantic affinities between pairs of pixel features, and CLGs describe classification consistencies between pairs of pixel labels. With the support of proximate pattern information from graphs, MLLC can rectify incorrectly predicted pseudo-labels and can facilitate discriminative feature representations. We design an end-to-end network to train and perform this effective label corrections mechanism. Experiments demonstrate that MLLC can significantly improve supervised baselines and outperforms state-of-the-art approaches in different scenarios on Cityscapes and PASCAL VOC 2012 datasets. Specifically, MLLC improves the supervised baseline by at least 5% and 2% with DeepLabV2 and DeepLabV3+ respectively under different partition protocols. △ Less

Submitted 9 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 12 pages, 8 figures. IEEE Transactions on Multimedia, 2024

arXiv:2403.01976 [pdf, other]

SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis

Authors: Hengxing Cai, Xiaochen Cai, Junhan Chang, Sihang Li, Lin Yao, Changxin Wang, Zhifeng Gao, Hongshuai Wang, Yongge Li, Mujie Lin, Shuwen Yang, Jiankun Wang, Mingjun Xu, Jin Huang, Fang Xi, Jiaxi Zhuang, Yuqi Yin, Yaqi Li, Changhong Chen, Zheng Cheng, Zifeng Zhao, Linfeng Zhang, Guolin Ke

Abstract: Recent breakthroughs in Large Language Models (LLMs) have revolutionized natural language understanding and generation, sparking significant interest in applying them to scientific literature analysis. However, existing benchmarks fail to adequately evaluate the proficiency of LLMs in this domain, particularly in scenarios requiring higher-level abilities beyond mere memorization and the handling… ▽ More Recent breakthroughs in Large Language Models (LLMs) have revolutionized natural language understanding and generation, sparking significant interest in applying them to scientific literature analysis. However, existing benchmarks fail to adequately evaluate the proficiency of LLMs in this domain, particularly in scenarios requiring higher-level abilities beyond mere memorization and the handling of multimodal data. In response to this gap, we introduce SciAssess, a benchmark specifically designed for the comprehensive evaluation of LLMs in scientific literature analysis. SciAssess aims to thoroughly assess the efficacy of LLMs by focusing on their capabilities in Memorization (L1), Comprehension (L2), and Analysis \& Reasoning (L3). It encompasses a variety of tasks drawn from diverse scientific fields, including fundamental science, alloy materials, biomedicine, drug discovery, and organic materials. To ensure the reliability of SciAssess, rigorous quality control measures have been implemented, ensuring accuracy, anonymization, and compliance with copyright standards. SciAssess evaluates 11 LLMs, including GPT, Claude, and Gemini, highlighting their strengths and areas for improvement. This evaluation supports the ongoing development of LLM applications in the analysis of scientific literature. SciAssess and its resources are available at \url{https://sci-assess.github.io/}. △ Less

Submitted 18 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.17300 [pdf, other]

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Authors: Linshan Wu, Jiaxin Zhuang, Hao Chen

Abstract: Self-Supervised Learning (SSL) has demonstrated promising results in 3D medical image analysis. However, the lack of high-level semantics in pre-training still heavily hinders the performance of downstream tasks. We observe that 3D medical images contain relatively consistent contextual position information, i.e., consistent geometric relations between different organs, which leads to a potential… ▽ More Self-Supervised Learning (SSL) has demonstrated promising results in 3D medical image analysis. However, the lack of high-level semantics in pre-training still heavily hinders the performance of downstream tasks. We observe that 3D medical images contain relatively consistent contextual position information, i.e., consistent geometric relations between different organs, which leads to a potential way for us to learn consistent semantic representations in pre-training. In this paper, we propose a simple-yet-effective Volume Contrast (VoCo) framework to leverage the contextual position priors for pre-training. Specifically, we first generate a group of base crops from different regions while enforcing feature discrepancy among them, where we employ them as class assignments of different regions. Then, we randomly crop sub-volumes and predict them belonging to which class (located at which region) by contrasting their similarity to different base crops, which can be seen as predicting contextual positions of different sub-volumes. Through this pretext task, VoCo implicitly encodes the contextual position priors into model representations without the guidance of annotations, enabling us to effectively improve the performance of downstream tasks that require high-level semantics. Extensive experimental results on six downstream tasks demonstrate the superior effectiveness of VoCo. Code will be available at https://github.com/Luffy03/VoCo. △ Less

Submitted 17 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted by CVPR 2024. The camera-ready version will soon be available

arXiv:2402.10409 [pdf, other]

Understanding Survey Paper Taxonomy about Large Language Models via Graph Representation Learning

Authors: Jun Zhuang, Casey Kennington

Abstract: As new research on Large Language Models (LLMs) continues, it is difficult to keep up with new research and models. To help researchers synthesize the new research many have written survey papers, but even those have become numerous. In this paper, we develop a method to automatically assign survey papers to a taxonomy. We collect the metadata of 144 LLM survey papers and explore three paradigms t… ▽ More As new research on Large Language Models (LLMs) continues, it is difficult to keep up with new research and models. To help researchers synthesize the new research many have written survey papers, but even those have become numerous. In this paper, we develop a method to automatically assign survey papers to a taxonomy. We collect the metadata of 144 LLM survey papers and explore three paradigms to classify papers within the taxonomy. Our work indicates that leveraging graph structure information on co-category graphs can significantly outperform the language models in two paradigms; pre-trained language models' fine-tuning and zero-shot/few-shot classifications using LLMs. We find that our model surpasses an average human recognition level and that fine-tuning LLMs using weak labels generated by a smaller model, such as the GCN in this study, can be more effective than using ground-truth labels, revealing the potential of weak-to-strong generalization in the taxonomy classification task. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: TL;DR: We collected metadata about LLM surveys and developed a method for categorizing them into a taxonomy, indicating the superiority of graph representation learning over language models and revealing the efficacy of fine-tuning using weak labels

arXiv:2402.08054 [pdf, other]

doi 10.1103/PhysRevA.109.043320

Probing the interaction energy of two $^{85}$Rb atoms in an optical tweezer via spin-motion coupling

Authors: Jun Zhuang, Kun-Peng Wang, Peng-Xiang Wang, Ming-Rui Wei, Bahtiyar Mamat, Cheng Sheng, Peng Xu, Min Liu, Jin Wang, Xiao-Dong He, Ming-Sheng Zhan

Abstract: The inherent polarization gradients in tight optical tweezers can be used to couple the atomic spins to the two-body motion under the action of a microwave spin-flip transition, so that such a spin-motion coupling offers an important control knob on the motional states of optically trapped two colliding atoms. Here, after preparing two elastically scattering $^{85}$Rb atoms in the three-dimensiona… ▽ More The inherent polarization gradients in tight optical tweezers can be used to couple the atomic spins to the two-body motion under the action of a microwave spin-flip transition, so that such a spin-motion coupling offers an important control knob on the motional states of optically trapped two colliding atoms. Here, after preparing two elastically scattering $^{85}$Rb atoms in the three-dimensional ground-state in the optical tweezer, we employed this control in order to probe the colliding energies of elastic and inelastic channels. The combination of microwave spectra and corresponding s-wave pseudopotential model allows us to infer the effect of the state-dependent trapping potentials on the elastic colliding energies, as well as to reveal how the presence of inelastic interactions affects elastic part of the relative potential. Our work shows that the spin-motion coupling in a tight optical tweezer expand the experimental toolbox for fundamental studies of ultracold collisions in the two body systems with reactive collisions, and potentially for that of more complex interactions, such as optically trapped atom-molecule and molecule-molecule interactions. △ Less

Submitted 2 July, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: 8 pages, 5 figures

Journal ref: Phys. Rev. A 109, 043320 (2024)

arXiv:2402.04928 [pdf]

doi 10.1021/acs.langmuir.2c02060

Influences of Divalent Ions in Natural Seawater/River Water on Nanofluidic Osmotic Energy Generation

Authors: Fenhong Song, Xuan An, Long Ma, Jiakun Zhuang, Yinghua Qiu

Abstract: Besides the dominant NaCl, natural seawater/river water contains trace multivalent ions, which can provide effective screening to surface charges. Here, in both negatively and positively charged nanopores, influences from divalent ions as counterions and coions have been investigated on the performance of osmotic energy conversion (OEC) under natural salt gradients. As counterions, trace Ca2+ ions… ▽ More Besides the dominant NaCl, natural seawater/river water contains trace multivalent ions, which can provide effective screening to surface charges. Here, in both negatively and positively charged nanopores, influences from divalent ions as counterions and coions have been investigated on the performance of osmotic energy conversion (OEC) under natural salt gradients. As counterions, trace Ca2+ ions can suppress the electric power and conversion efficiency significantly. The reduced OEC performance is due to the bivalence and low diffusion coefficient of Ca2 ions, instead of the uphill transport of divalent ions discovered in the previous work. Effectively screened charged surfaces by Ca2+ ions induce enhanced diffusion of Cl ions which simultaneously decreases the net ion penetration and ionic selectivity of the nanopore. While as coions, Ca2+ ions have weak effects on the OEC performance. The promotion from charged exterior surfaces on OEC processes for ultra-short nanopores is also studied, which effective region is ~200 nm in width beyond pore boundaries independent of the presence of Ca2+ ions. Our results shed light on the physical details of the nanofluidic OEC process under natural seawater/river water conditions, which can provide a useful guide for high-performance osmotic energy harvesting. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 24 pages, 5 figures

Journal ref: Langmuir 2022, 38 (42), 12935-12943

arXiv:2402.04920 [pdf]

doi 10.1002/elps.202200198

Characterization of the Surface Charge Property and Porosity of Track-etched Polymer Membranes

Authors: Jiakun Zhuang, Long Ma, Yinghua Qiu

Abstract: As an important property of porous membranes, the surface charge property determines many ionic behaviors of nanopores, such as ionic conductance and selectivity. Based on the dependence of electric double layers on bulk concentrations, ionic conductance through nanopores at high and low concentrations is governed by the bulk conductance and surface charge density, respectively. Here, through the… ▽ More As an important property of porous membranes, the surface charge property determines many ionic behaviors of nanopores, such as ionic conductance and selectivity. Based on the dependence of electric double layers on bulk concentrations, ionic conductance through nanopores at high and low concentrations is governed by the bulk conductance and surface charge density, respectively. Here, through the investigation of ionic conductance inside track-etched single polyethylene terephthalate (PET) nanopores under various concentrations, the surface charge density of PET membranes is extracted as around 0.021 C per m2 at pH 10 over measurements with 40 PET nanopores. Simulations show that surface roughness can cause underestimation in surface charge density due to the inhibited electroosmotic flow. Then, the averaged pore size and porosity of track-etched multipore PET membranes are characterized by the developed ionic conductance method. Through coupled theoretical predictions in ionic conductance under high and low concentrations, the averaged pore size and porosity of porous membranes can be obtained simultaneously. Our method provides a simple and precise way to characterize the pore size and porosity of multipore membranes, especially for those with sub-100 nm pores and low porosities. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 25 pages, 4 figures

Journal ref: Electrophoresis 2022, 43 (23-24), 2428-2435

arXiv:2401.14828 [pdf, other]

TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

Authors: Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan

Abstract: Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D b… ▽ More Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region. With the image prompt, users can conveniently specify the detailed appearance/style of the target content in complement to the text description, enabling accurate control of the appearance. Specifically, TIP-Editor employs a stepwise 2D personalization strategy to better learn the representation of the existing scene and the reference image, in which a localization loss is proposed to encourage correct object placement as specified by the bounding box. Additionally, TIPEditor utilizes explicit and flexible 3D Gaussian splatting as the 3D representation to facilitate local editing while keeping the background unchanged. Extensive experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region, consistently outperforming the baselines in editing quality, and the alignment to the prompts, qualitatively and quantitatively. △ Less

Submitted 25 April, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: Accpeted by Siggraph 2024 & ACM Transactions on Graphics

arXiv:2401.11261 [pdf, other]

Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient

Authors: Weiguo Lu, Xuan Wu, Deng Ding, Jinqiao Duan, Jirong Zhuang, Gangnan Yuan

Abstract: Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feat… ▽ More Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feature conditioning to guide the denoising process. Based on set theory, we provide a comprehensive theoretical analysis that shows that conditional latent distribution based on features and classes is significantly different, so that conditional latent distribution on features produces fewer defect generations than conditioning on classes. Two diffusion models conditioned on the Gaussian mixture model are trained separately for comparison. Experiments support our findings. A novel gradient function called the negative Gaussian mixture gradient (NGMG) is proposed and applied in diffusion model training with an additional classifier. Training stability has improved. We also theoretically prove that NGMG shares the same benefit as the Earth Mover distance (Wasserstein) as a more sensible cost function when learning distributions supported by low-dimensional manifolds. △ Less

Submitted 1 February, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

arXiv:2401.10417 [pdf, other]

doi 10.1145/3626202.3637569

SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration

Authors: Jinming Zhuang, Zhuoping Yang, Shixin Ji, Heng Huang, Alex K. Jones, Jingtong Hu, Yiyu Shi, Peipei Zhou

Abstract: With the increase in the computation intensity of the chip, the mismatch between computation layer shapes and the available computation resource significantly limits the utilization of the chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize the throughput. However, using spatial accelerators could potentially increase the execution latenc… ▽ More With the increase in the computation intensity of the chip, the mismatch between computation layer shapes and the available computation resource significantly limits the utilization of the chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize the throughput. However, using spatial accelerators could potentially increase the execution latency. In this work, we first systematically investigate two execution models: (1) sequentially (temporally) launch one monolithic accelerator, and (2) spatially launch multiple accelerators. From the observations, we find that there is a latency throughput tradeoff between these two execution models, and combining these two strategies together can give us a more efficient latency throughput Pareto front. To achieve this, we propose spatial sequential architecture (SSR) and SSR design automation framework to explore both strategies together when deploying deep learning inference. We use the 7nm AMD Versal ACAP VCK190 board to implement SSR accelerators for four end-to-end transformer-based deep learning models. SSR achieves average throughput gains of 2.53x, 35.71x, and 14.20x under different batch sizes compared to the 8nm Nvidia GPU A10G, 16nm AMD FPGAs ZCU102, and U250. The average energy efficiency gains are 8.51x, 6.75x, and 21.22x, respectively. Compared with the sequential-only solution and spatial-only solution on VCK190, our spatial-sequential-hybrid solutions achieve higher throughput under the same latency requirement and lower latency under the same throughput requirement. We also use SSR analytical models to demonstrate how to use SSR to optimize solutions on other computing platforms, e.g., 14nm Intel Stratix 10 NX. △ Less

Submitted 18 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Journal ref: 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '24)

arXiv:2401.00695 [pdf, other]

Credible Teacher for Semi-Supervised Object Detection in Open Scene

Authors: Jingyu Zhuang, Kuo Wang, Liang Lin, Guanbin Li

Abstract: Semi-Supervised Object Detection (SSOD) has achieved resounding success by leveraging unlabeled data to improve detection performance. However, in Open Scene Semi-Supervised Object Detection (O-SSOD), unlabeled data may contains unknown objects not observed in the labeled data, which will increase uncertainty in the model's predictions for known objects. It is detrimental to the current methods th… ▽ More Semi-Supervised Object Detection (SSOD) has achieved resounding success by leveraging unlabeled data to improve detection performance. However, in Open Scene Semi-Supervised Object Detection (O-SSOD), unlabeled data may contains unknown objects not observed in the labeled data, which will increase uncertainty in the model's predictions for known objects. It is detrimental to the current methods that mainly rely on self-training, as more uncertainty leads to the lower localization and classification precision of pseudo labels. To this end, we propose Credible Teacher, an end-to-end framework. Credible Teacher adopts an interactive teaching mechanism using flexible labels to prevent uncertain pseudo labels from misleading the model and gradually reduces its uncertainty through the guidance of other credible pseudo labels. Empirical results have demonstrated our method effectively restrains the adverse effect caused by O-SSOD and significantly outperforms existing counterparts. △ Less

Submitted 2 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Comments: Accpet by ICASSP 2024

arXiv:2312.10903 [pdf, other]

Robust Node Representation Learning via Graph Variational Diffusion Networks

Authors: Jun Zhuang, Mohammad Al Hasan

Abstract: Node representation learning by using Graph Neural Networks (GNNs) has been widely explored. However, in recent years, compelling evidence has revealed that GNN-based node representation learning can be substantially deteriorated by delicately-crafted perturbations in a graph structure. To learn robust node representation in the presence of perturbations, various works have been proposed to safegu… ▽ More Node representation learning by using Graph Neural Networks (GNNs) has been widely explored. However, in recent years, compelling evidence has revealed that GNN-based node representation learning can be substantially deteriorated by delicately-crafted perturbations in a graph structure. To learn robust node representation in the presence of perturbations, various works have been proposed to safeguard GNNs. Within these existing works, Bayesian label transition has been proven to be more effective, but this method is extensively reliant on a well-built prior distribution. The variational inference could address this limitation by sampling the latent node embedding from a Gaussian prior distribution. Besides, leveraging the Gaussian distribution (noise) in hidden layers is an appealing strategy to strengthen the robustness of GNNs. However, our experiments indicate that such a strategy can cause over-smoothing issues during node aggregation. In this work, we propose the Graph Variational Diffusion Network (GVDN), a new node encoder that effectively manipulates Gaussian noise to safeguard robustness on perturbed graphs while alleviating over-smoothing issues through two mechanisms: Gaussian diffusion and node embedding propagation. Thanks to these two mechanisms, our model can generate robust node embeddings for recovery. Specifically, we design a retraining mechanism using the generated node embedding to recover the performance of node classifications in the presence of perturbations. The experiments verify the effectiveness of our proposed model across six public datasets. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: preprint, under review

arXiv:2312.03594 [pdf, other]

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

Authors: Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, Kai Chen

Abstract: Advancing image inpainting is challenging as it requires filling user-specified regions for various intents, such as background filling and object synthesis. Existing approaches focus on either context-aware filling or object synthesis using text descriptions. However, achieving both tasks simultaneously is challenging due to differing training strategies. To overcome this challenge, we introduce… ▽ More Advancing image inpainting is challenging as it requires filling user-specified regions for various intents, such as background filling and object synthesis. Existing approaches focus on either context-aware filling or object synthesis using text descriptions. However, achieving both tasks simultaneously is challenging due to differing training strategies. To overcome this challenge, we introduce PowerPaint, the first high-quality and versatile inpainting model that excels in multiple inpainting tasks. First, we introduce learnable task prompts along with tailored fine-tuning strategies to guide the model's focus on different inpainting targets explicitly. This enables PowerPaint to accomplish various inpainting tasks by utilizing different task prompts, resulting in state-of-the-art performance. Second, we demonstrate the versatility of the task prompt in PowerPaint by showcasing its effectiveness as a negative prompt for object removal. Moreover, we leverage prompt interpolation techniques to enable controllable shape-guided object inpainting, enhancing the model's applicability in shape-guided applications. Finally, we conduct extensive experiments and applications to verify the effectiveness of PowerPaint. We release our codes and models on our project page: https://powerpaint.github.io/. △ Less

Submitted 23 July, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Project page with code: https://powerpaint.github.io/

arXiv:2312.02991 [pdf, other]

REFRESH FPGAs: Sustainable FPGA Chiplet Architectures

Authors: Peipei Zhou, Jinming Zhuang, Stephen Cahoon, Yue Tang, Zhuoping Yang, Xingzhen Chen, Yiyu Shi, Jingtong Hu, Alex K. Jones

Abstract: There is a growing call for greater amounts of increasingly agile computational power for edge and cloud infrastructure to serve the computationally complex needs of ubiquitous computing devices. Thus, an important challenge is addressing the holistic environmental impacts of these next-generation computing systems. To accomplish this, a life-cycle view of sustainability for computing advancements… ▽ More There is a growing call for greater amounts of increasingly agile computational power for edge and cloud infrastructure to serve the computationally complex needs of ubiquitous computing devices. Thus, an important challenge is addressing the holistic environmental impacts of these next-generation computing systems. To accomplish this, a life-cycle view of sustainability for computing advancements is necessary to reduce environmental impacts such as greenhouse warming gas emissions from these computing choices. Unfortunately, decadal efforts to address operational energy efficiency in computing devices have ignored and in some cases exacerbated embodied impacts from manufacturing these edge and cloud systems, particularly their integrated circuits. During this time FPGA architectures have not changed dramatically except to increase in size. Given this context, we propose REFRESH FPGAs to build new FPGA devices and architectures from recently retired FPGA dies using 2.5D integration. To build REFRESH FPGAs requires creative architectures that leverage existing chiplet pins with an inexpensive to-manufacture interposer coupled with creative design automation. In this paper, we discuss how REFRESH FPGAs can leverage industry trends for renewable energy integration into data centers while providing an overall improvement for sustainability and amortizing their significant embodied cost investment over a much longer ``first'' lifetime. △ Less

Submitted 27 November, 2023; originally announced December 2023.

arXiv:2312.02597 [pdf, other]

Mitigating noise of residual electric fields for single Rydberg atoms with electron photodesorption

Authors: Bahtiyar Mamat, Cheng Sheng, Xiaodong He, Jiayi Hou, Peng Xu, Kunpeng Wang, Jun Zhuang, Mingrui Wei, Min Liu, Jin Wang, Mingsheng Zhan

Abstract: Rydberg atoms as versatile tools for quantum applications are extremely sensitive to electric fields. When utilizing these atoms, it becomes imperative to comprehensively characterize and mitigate any residual electric fields present in the environment. Particularly for single Rydberg atoms trapped in optical tweezers in a compact quartz vacuum cell, we have identified that a significant source of… ▽ More Rydberg atoms as versatile tools for quantum applications are extremely sensitive to electric fields. When utilizing these atoms, it becomes imperative to comprehensively characterize and mitigate any residual electric fields present in the environment. Particularly for single Rydberg atoms trapped in optical tweezers in a compact quartz vacuum cell, we have identified that a significant source of background electric fields originates from electrons bound to the cell surface. These electrons are generated by the 297-nm light used for single-photon Rydberg excitation. Furthermore, once the electrons are desorbed from the surface through exposure to ultraviolet light, the incoherent ground-Rydberg transition undergoes a transformation into coherent excitation, since the noise of residual electric fields are effectively mitigated. Our studies promote enhanced control and reliable performance of Rydberg atom-based systems, thereby paving the way for advancements in quantum information processing, the realization of high-fidelity quantum gates, and the development of precise quantum sensors. △ Less

Submitted 26 February, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.18420 [pdf, other]

TeG-DG: Textually Guided Domain Generalization for Face Anti-Spoofing

Authors: Lianrui Mu, Jianhong Bai, Xiaoxuan He, Jiangnan Ye, Xiaoyu Liang, Yuchen Yang, Jiedong Zhuang, Haoji Hu

Abstract: Enhancing the domain generalization performance of Face Anti-Spoofing (FAS) techniques has emerged as a research focus. Existing methods are dedicated to extracting domain-invariant features from various training domains. Despite the promising performance, the extracted features inevitably contain residual style feature bias (e.g., illumination, capture device), resulting in inferior generalizatio… ▽ More Enhancing the domain generalization performance of Face Anti-Spoofing (FAS) techniques has emerged as a research focus. Existing methods are dedicated to extracting domain-invariant features from various training domains. Despite the promising performance, the extracted features inevitably contain residual style feature bias (e.g., illumination, capture device), resulting in inferior generalization performance. In this paper, we propose an alternative and effective solution, the Textually Guided Domain Generalization (TeG-DG) framework, which can effectively leverage text information for cross-domain alignment. Our core insight is that text, as a more abstract and universal form of expression, can capture the commonalities and essential characteristics across various attacks, bridging the gap between different image domains. Contrary to existing vision-language models, the proposed framework is elaborately designed to enhance the domain generalization ability of the FAS task. Concretely, we first design a Hierarchical Attention Fusion (HAF) module to enable adaptive aggregation of visual features at different levels; Then, a Textual-Enhanced Visual Discriminator (TEVD) is proposed for not only better alignment between the two modalities but also to regularize the classifier with unbiased text features. TeG-DG significantly outperforms previous approaches, especially in situations with extremely limited source domain data (~14% and ~12% improvements on HTER and AUC respectively), showcasing impressive few-shot performance. △ Less

Submitted 30 January, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.16417 [pdf, other]

Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets

Authors: Zhuoping Yang, Shixin Ji, Xingzhen Chen, Jinming Zhuang, Weifeng Zhang, Dharmesh Jani, Peipei Zhou

Abstract: Fast-evolving artificial intelligence (AI) algorithms such as large language models have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scali… ▽ More Fast-evolving artificial intelligence (AI) algorithms such as large language models have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scaling out the system as well as to reduce the design complexity and the cost stemming from the traditional monolithic chip design. However, how to interconnect computing resources and orchestrate heterogeneous chiplets is the key to success. In this paper, we first discuss the diversity and evolving demands of different AI workloads. We discuss how chiplet brings better cost efficiency and shorter time to market. Then we discuss the challenges in establishing chiplet interface standards, packaging, and security issues. We further discuss the software programming challenges in chiplet systems. △ Less

Submitted 4 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.07211 [pdf, other]

A Gaussian Process Based Method with Deep Kernel Learning for Pricing High-dimensional American Options

Authors: Jirong Zhuang, Deng Ding, Weiguo Lu, Xuan Wu, Gangnan Yuan

Abstract: In this work, we present a novel machine learning approach for pricing high-dimensional American options based on the modified Gaussian process regression (GPR). We incorporate deep kernel learning and sparse variational Gaussian processes to address the challenges traditionally associated with GPR. These challenges include its diminished reliability in high-dimensional scenarios and the excessive… ▽ More In this work, we present a novel machine learning approach for pricing high-dimensional American options based on the modified Gaussian process regression (GPR). We incorporate deep kernel learning and sparse variational Gaussian processes to address the challenges traditionally associated with GPR. These challenges include its diminished reliability in high-dimensional scenarios and the excessive computational costs associated with processing extensive numbers of simulated paths Our findings indicate that the proposed method surpasses the performance of the least squares Monte Carlo method in high-dimensional scenarios, particularly when the underlying assets are modeled by Merton's jump diffusion model. Moreover, our approach does not exhibit a significant increase in computational time as the number of dimensions grows. Consequently, this method emerges as a potential tool for alleviating the challenges posed by the curse of dimensionality. △ Less

Submitted 18 April, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: 21pages,8 figures

arXiv:2311.02893 [pdf]

Topological electronic structure and spin texture of quasi-one-dimensional higher-order topological insulator Bi4Br4

Authors: W. X. Zhao, M. Yang, R. Z. Xu, X. Du, Y. D. Li, K. Y. Zhai, C. Peng, D. Pei, H. Gao, Y. W. Li, L. X. Xu, J. F. Han, Y. Huang, Z. K. Liu, Y. G. Yao, J. C. Zhuang, Y. Du, J. J. Zhou, Y. L. Chen, L. X. Yang

Abstract: The notion of topological insulators (TIs), characterized by an insulating bulk and conducting topological surface states, can be extended to higher-order topological insulators (HOTIs) hosting gapless modes localized at the boundaries of two or more dimensions lower than the insulating bulk1-5. In this work, by performing high-resolution angle-resolved photoemission spectroscopy (ARPES) measureme… ▽ More The notion of topological insulators (TIs), characterized by an insulating bulk and conducting topological surface states, can be extended to higher-order topological insulators (HOTIs) hosting gapless modes localized at the boundaries of two or more dimensions lower than the insulating bulk1-5. In this work, by performing high-resolution angle-resolved photoemission spectroscopy (ARPES) measurements with submicron spatial and spin resolutions, we systematically investigate the electronic structure and spin texture of quasi-one-dimensional (1D) HOTI candidate Bi4Br4. In contrast to the bulk-state-dominant spectra on the (001) surface, we observe gapped surface states on the (100) surface, whose dispersion and spin-polarization agree well with our ab initio calculations. Moreover, we reveal in-gap states connecting the surface valence and conduction bands, which is an explicit signature of the existence of hinge states inside the (100) surface gap. Our findings provide compelling evidence for the HOTI phase of Bi4Br4. The identification of the higher-order topological phase will lay the promising prospect of applications based on 1D spin-momentum locked current in electronic and spintronic devices. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.01159 [pdf, other]

Iterative Semi-Supervised Learning for Abdominal Organs and Tumor Segmentation

Authors: Jiaxin Zhuang, Luyang Luo, Zhixuan Chen, Linshan Wu

Abstract: Deep-learning (DL) based methods are playing an important role in the task of abdominal organs and tumors segmentation in CT scans. However, the large requirements of annotated datasets heavily limit its development. The FLARE23 challenge provides a large-scale dataset with both partially and fully annotated data, which also focuses on both segmentation accuracy and computational efficiency. In th… ▽ More Deep-learning (DL) based methods are playing an important role in the task of abdominal organs and tumors segmentation in CT scans. However, the large requirements of annotated datasets heavily limit its development. The FLARE23 challenge provides a large-scale dataset with both partially and fully annotated data, which also focuses on both segmentation accuracy and computational efficiency. In this study, we propose to use the strategy of Semi-Supervised Learning (SSL) and iterative pseudo labeling to address FLARE23. Initially, a deep model (nn-UNet) trained on datasets with complete organ annotations (about 220 scans) generates pseudo labels for the whole dataset. These pseudo labels are then employed to train a more powerful segmentation model. Employing the FLARE23 dataset, our approach achieves an average DSC score of 89.63% for organs and 46.07% for tumors on online validation leaderboard. For organ segmentation, We obtain 0.9007\% DSC and 0.9493\% NSD. For tumor segmentation, we obtain 0.3785% DSC and 0.2842% NSD. Our code is available at https://github.com/USTguy/Flare23. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: arXiv admin note: text overlap with arXiv:2309.05405

arXiv:2309.16137 [pdf, other]

Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

Authors: Yuanmin Tang, Jing Yu, Keke Gai, Jiamin Zhuang, Gang Xiong, Yue Hu, Qi Wu

Abstract: Different from Composed Image Retrieval task that requires expensive labels for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks with a broad range of visual content manipulation intent that could be related to domain, scene, object, and attribute. The key challenge for ZS-CIR tasks is to learn a more accurate image representation that has adaptive… ▽ More Different from Composed Image Retrieval task that requires expensive labels for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks with a broad range of visual content manipulation intent that could be related to domain, scene, object, and attribute. The key challenge for ZS-CIR tasks is to learn a more accurate image representation that has adaptive attention to the reference image for various manipulation descriptions. In this paper, we propose a novel context-dependent mapping network, named Context-I2W, for adaptively converting description-relevant Image information into a pseudo-word token composed of the description for accurate ZS-CIR. Specifically, an Intent View Selector first dynamically learns a rotation rule to map the identical image to a task-specific manipulation view. Then a Visual Target Extractor further captures local information covering the main targets in ZS-CIR tasks under the guidance of multiple learnable queries. The two complementary modules work together to map an image to a context-dependent pseudo-word token without extra supervision. Our model shows strong generalization ability on four ZS-CIR tasks, including domain conversion, object composition, object manipulation, and attribute manipulation. It obtains consistent and significant performance boosts ranging from 1.88% to 3.60% over the best methods and achieves new state-of-the-art results on ZS-CIR. Our code is available at https://github.com/Pter61/context-i2w. △ Less

Submitted 15 December, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

Journal ref: AAAI 2024

Showing 1–50 of 473 results for author: Zhuang, J