Search | arXiv e-print repository

In-Context Imitation Learning via Next-Token Prediction

Authors: Letian Fu, Huang Huang, Gaurav Datta, Lawrence Yunliang Chen, William Chung-Ho Panitch, Fangchen Liu, Hui Li, Ken Goldberg

Abstract: We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor traj… ▽ More We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor trajectories without relying on any linguistic data or reward function. This formulation enables flexible and training-free execution of new tasks at test time, achieved by prompting the model with sensorimotor trajectories of the new task composing of image observations, actions and states tuples, collected through human teleoperation. Experiments with a Franka Emika robot demonstrate that the ICRT can adapt to new tasks specified by prompts, even in environment configurations that differ from both the prompt and the training data. In a multitask environment setup, ICRT significantly outperforms current state-of-the-art next-token prediction models in robotics on generalizing to unseen tasks. Code, checkpoints and data are available on https://icrt.dev/ △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15925 [pdf, ps, other]

Explicit Folded Reed-Solomon and Multiplicity Codes Achieve Relaxed Generalized Singleton Bound

Authors: Yeyuan Chen, Zihan Zhang

Abstract: In this paper, we prove that any `appropriate' folded Reed-Solomon and univariate multiplicity codes achieve relaxed generalized Singleton bound for list size $L\ge1.$ More concretely, we show the following: (1) Any $(s,γ)$-folded RS code over the alphabet $\mathbb{F}_q^s$ of block length $n$ and rate $R$ with pair-wise distinct evaluation points… ▽ More In this paper, we prove that any `appropriate' folded Reed-Solomon and univariate multiplicity codes achieve relaxed generalized Singleton bound for list size $L\ge1.$ More concretely, we show the following: (1) Any $(s,γ)$-folded RS code over the alphabet $\mathbb{F}_q^s$ of block length $n$ and rate $R$ with pair-wise distinct evaluation points $\{γ^iα_j\}_{(i,j)\in\left(\{0\}\sqcup[s-1],[n]\right)}\subset\mathbb{F}_q$ are $\left(\frac{L}{L+1}\left(1-\frac{sR}{s-L+1}\right),L\right)$ (average-radius) list-decodable for list size $L\in[s]$. (2) Any $s$-order univariate multiplicity code over the alphabet $\mathbb{F}_p^s$ ($p$ is a prime) of block length $n$ and rate $R$ with pair-wise distinct evaluation points $\{α_i\}_{i\in[n]}\subset\mathbb{F}_p$ are $\left(\frac{L}{L+1}\left(1-\frac{sR}{s-L+1}\right),L\right)$ (average-radius) list-decodable for list size $L\in[s]$. Choose $s=Θ(1/ε^2)$ and $L=O(1/ε)$, our results imply that both explicit folded RS codes and explicit univariate multiplicity codes achieve list decoding capacity $1-R-ε$ with evidently optimal list size $O(1/ε)$, which exponentially improves the previous state-of-the-art $(1/ε)^{O(1/ε)}$ established by Kopparty, Ron-Zewi, Saraf, and Wootters (FOCS 2018 or SICOMP, 2023) and Tamo (IEEE TIT, 2024). In particular, our results on folded Reed--Solomon codes fully resolve a long-standing open problem originally proposed by Guruswami and Rudra (STOC 2006 or IEEE TIT, 2008). Furthermore, our results imply the first explicit constructions of $(1-R-ε,O(1/ε))$ (average-radius) list-decodable codes of rate $R$ with polynomial-sized alphabets in the literature. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15609 [pdf, other]

Statistical QoS Provision in Business-Centric Networks

Authors: Chang Wu, Yuang Chen, Hancheng Lu

Abstract: More refined resource management and Quality of Service (QoS) provisioning is a critical goal of wireless communication technologies. In this paper, we propose a novel Business-Centric Network (BCN) aimed at enabling scalable QoS provisioning, based on a cross-layer framework that captures the relationship between application, transport parameters, and channels. We investigate both continuous flow… ▽ More More refined resource management and Quality of Service (QoS) provisioning is a critical goal of wireless communication technologies. In this paper, we propose a novel Business-Centric Network (BCN) aimed at enabling scalable QoS provisioning, based on a cross-layer framework that captures the relationship between application, transport parameters, and channels. We investigate both continuous flow and event-driven flow models, presenting key QoS metrics such as throughput, delay, and reliability. By jointly considering power and bandwidth allocation, transmission parameters, and AP network topology across layers, we optimize weighted resource efficiency with statistical QoS provisioning. To address the coupling among parameters, we propose a novel deep reinforcement learning (DRL) framework, which is Collaborative Optimization among Heterogeneous Actors with Experience Sharing (COHA-ES). Power and sub-channel (SC) Actors representing multiple APs are jointly optimized under the unified guidance of a common critic. Additionally, we introduce a novel multithreaded experience-sharing mechanism to accelerate training and enhance rewards. Extensive comparative experiments validate the effectiveness of our DRL framework in terms of convergence and efficiency. Moreover, comparative analyses demonstrate the comprehensive advantages of the BCN structure in enhancing both spectral and energy efficiency. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 13 pages

arXiv:2408.15524 [pdf, other]

Ray-Distance Volume Rendering for Neural Scene Reconstruction

Authors: Ruihong Yin, Yunlu Chen, Sezer Karaoglu, Theo Gevers

Abstract: Existing methods in neural scene reconstruction utilize the Signed Distance Function (SDF) to model the density function. However, in indoor scenes, the density computed from the SDF for a sampled point may not consistently reflect its real importance in volume rendering, often due to the influence of neighboring objects. To tackle this issue, our work proposes a novel approach for indoor scene re… ▽ More Existing methods in neural scene reconstruction utilize the Signed Distance Function (SDF) to model the density function. However, in indoor scenes, the density computed from the SDF for a sampled point may not consistently reflect its real importance in volume rendering, often due to the influence of neighboring objects. To tackle this issue, our work proposes a novel approach for indoor scene reconstruction, which instead parameterizes the density function with the Signed Ray Distance Function (SRDF). Firstly, the SRDF is predicted by the network and transformed to a ray-conditioned density function for volume rendering. We argue that the ray-specific SRDF only considers the surface along the camera ray, from which the derived density function is more consistent to the real occupancy than that from the SDF. Secondly, although SRDF and SDF represent different aspects of scene geometries, their values should share the same sign indicating the underlying spatial occupancy. Therefore, this work introduces a SRDF-SDF consistency loss to constrain the signs of the SRDF and SDF outputs. Thirdly, this work proposes a self-supervised visibility task, introducing the physical visibility geometry to the reconstruction task. The visibility task combines prior from predicted SRDF and SDF as pseudo labels, and contributes to generating more accurate 3D geometry. Our method implemented with different representations has been validated on indoor datasets, achieving improved performance in both reconstruction and view synthesis. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: Accepted by ECCV2024

arXiv:2408.15488 [pdf, other]

Legilimens: Practical and Unified Content Moderation for Large Language Model Services

Authors: Jialin Wu, Jiangyi Deng, Shengyuan Pang, Yanjiao Chen, Jiayang Xu, Xinfeng Li, Wenyuan Xu

Abstract: Given the societal impact of unsafe content generated by large language models (LLMs), ensuring that LLM services comply with safety standards is a crucial concern for LLM service providers. Common content moderation methods are limited by an effectiveness-and-efficiency dilemma, where simple models are fragile while sophisticated models consume excessive computational resources. In this paper, we… ▽ More Given the societal impact of unsafe content generated by large language models (LLMs), ensuring that LLM services comply with safety standards is a crucial concern for LLM service providers. Common content moderation methods are limited by an effectiveness-and-efficiency dilemma, where simple models are fragile while sophisticated models consume excessive computational resources. In this paper, we reveal for the first time that effective and efficient content moderation can be achieved by extracting conceptual features from chat-oriented LLMs, despite their initial fine-tuning for conversation rather than content moderation. We propose a practical and unified content moderation framework for LLM services, named Legilimens, which features both effectiveness and efficiency. Our red-team model-based data augmentation enhances the robustness of Legilimens against state-of-the-art jailbreaking. Additionally, we develop a framework to theoretically analyze the cost-effectiveness of Legilimens compared to other methods. We have conducted extensive experiments on five host LLMs, seventeen datasets, and nine jailbreaking methods to verify the effectiveness, efficiency, and robustness of Legilimens against normal and adaptive adversaries. A comparison of Legilimens with both commercial and academic baselines demonstrates the superior performance of Legilimens. Furthermore, we confirm that Legilimens can be applied to few-shot scenarios and extended to multi-label classification tasks. △ Less

Submitted 5 September, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

Comments: Accepted by ACM Conference on Computer and Communications Security (CCS) 2024

arXiv:2408.15242 [pdf, other]

Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

Authors: Saining Zhang, Baijun Ye, Xiaoxue Chen, Yuantao Chen, Zongzheng Zhang, Cheng Peng, Yongliang Shi, Hao Zhao

Abstract: Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuiti… ▽ More Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuitively, the data from the drone's perspective can provide a complementary viewpoint for the data from the ground vehicle's perspective, enhancing the completeness of scene reconstruction and rendering. However, training naively with aerial and ground images, which exhibit large view disparity, poses a significant convergence challenge for 3D-GS, and does not demonstrate remarkable improvements in performance on road views. In order to enhance the novel view synthesis of road views and to effectively use the aerial information, we design an uncertainty-aware training method that allows aerial images to assist in the synthesis of areas where ground images have poor learning outcomes instead of weighting all pixels equally in 3D-GS training like prior work did. We are the first to introduce the cross-view uncertainty to 3D-GS by matching the car-view ensemble-based rendering uncertainty to aerial images, weighting the contribution of each pixel to the training process. Additionally, to systematically quantify evaluation metrics, we assemble a high-quality synthesized dataset comprising both aerial and ground images for road scenes. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: BMVC2024 Project Page: https://sainingzhang.github.io/project/uc-gs/ Code: https://github.com/SainingZhang/uc-gs/

arXiv:2408.15224 [pdf, other]

SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images

Authors: Zafer Yildiz, Yuwen Chen, Maciej A. Mazurowski

Abstract: Creating annotations for 3D medical data is time-consuming and often requires highly specialized expertise. Various tools have been implemented to aid this process. Segment Anything Model 2 (SAM 2) offers a general-purpose prompt-based segmentation algorithm designed to annotate videos. In this paper, we adapt this model to the annotation of 3D medical images and offer our implementation in the fo… ▽ More Creating annotations for 3D medical data is time-consuming and often requires highly specialized expertise. Various tools have been implemented to aid this process. Segment Anything Model 2 (SAM 2) offers a general-purpose prompt-based segmentation algorithm designed to annotate videos. In this paper, we adapt this model to the annotation of 3D medical images and offer our implementation in the form of an extension to the popular annotation software: 3D Slicer. Our extension allows users to place point prompts on 2D slices to generate annotation masks and propagate these annotations across entire volumes in either single-directional or bi-directional manners. Our code is publicly available on https://github.com/mazurowski-lab/SlicerSegmentWithSAM and can be easily installed directly from the Extension Manager of 3D Slicer as well. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: Future work: support for box and mask inputs for the video predictor of SAM 2

arXiv:2408.15175 [pdf, other]

A Yebes W band Line Survey towards an Unshocked Molecular Cloud of Supernova Remnant 3C391: Evidence of Cosmic-Ray-Induced Chemistry

Authors: Tian-Yu Tu, Prathap Rayalacheruvu, Liton Majumdar, Yang Chen, Ping Zhou, Miguel Santander-García

Abstract: Cosmic rays (CRs) have strong influences on the chemistry of dense molecular clouds (MCs). To study the detailed chemistry induced by CRs, we conducted a Yebes W band line survey towards an unshocked MC (which we named as 3C391:NML) associated with supernova remnant (SNR) 3C391. We detected emission lines of 18 molecular species in total and estimated their column densities with local thermodynami… ▽ More Cosmic rays (CRs) have strong influences on the chemistry of dense molecular clouds (MCs). To study the detailed chemistry induced by CRs, we conducted a Yebes W band line survey towards an unshocked MC (which we named as 3C391:NML) associated with supernova remnant (SNR) 3C391. We detected emission lines of 18 molecular species in total and estimated their column densities with local thermodynamic equilibrium (LTE) and non-LTE analysis. Using the abundance ratio N(HCO+)/N(CO) and an upper limit of N(DCO+)/N(HCO+), we estimated the CR ionization rate of 3C391:NML is $ζ\gtrsim 2.7\times 10^{-14}\rm \ s^{-1}$ with an analytic method. However, we caution on adopting this value because chemical equilibrium, which is a prerequisite of using the equations, is not necessarily reached in 3C391:NML. We observed lower N(HCO+)/N(HOC+), higher N(HCS+)/N(CS), and higher X($l$-C3H+) by an order of magnitude in 3C391:NML than the typical values in quiescent dense MCs. We found that an enhanced CR ionization rate (of order $\sim 10^{-15}$ or $\sim 10^{-14}\rm \ s^{-1}$) is needed to reproduce the observation with chemical model. This is higher than the values found in typical MCs by 2--3 orders of magnitude. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: 18 pages, 7 figures, accepted for publication in ApJ

arXiv:2408.15101 [pdf, other]

MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

Authors: Baijiong Lin, Weisen Jiang, Pengguang Chen, Shu Liu, Ying-Cong Chen

Abstract: Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two… ▽ More Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging state-space models, while CTM explicitly models task interactions to facilitate information exchange across tasks. We design two types of CTM block, namely F-CTM and S-CTM, to enhance cross-task interaction from feature and semantic perspectives, respectively. Experiments on NYUDv2, PASCAL-Context, and Cityscapes datasets demonstrate the superior performance of MTMamba++ over CNN-based and Transformer-based methods. The code is available at https://github.com/EnVision-Research/MTMamba. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2407.02228

arXiv:2408.15033 [pdf, ps, other]

Stochastic dominance for super heavy-tailed random variables

Authors: Yuyu Chen, Seva Shneer

Abstract: We introduce a class of super heavy-tailed distributions and establish the inequality that any weighted average of independent and identically distributed super heavy-tailed random variables stochastically dominates one such random variable. We show that many commonly used extremely heavy-tailed (i.e., infinite-mean) distributions, such as the Pareto, Fréchet, and Burr distributions, belong to the… ▽ More We introduce a class of super heavy-tailed distributions and establish the inequality that any weighted average of independent and identically distributed super heavy-tailed random variables stochastically dominates one such random variable. We show that many commonly used extremely heavy-tailed (i.e., infinite-mean) distributions, such as the Pareto, Fréchet, and Burr distributions, belong to the class of super heavy-tailed distributions. The established stochastic dominance relation is further generalized to allow negatively dependent or non-identically distributed random variables. In particular, the weighted average of non-identically distributed random variables stochastically dominates their distribution mixtures. Applications of these results in portfolio diversification, goods bundling, and inventory management are discussed. Remarkably, in the presence of super heavy-tailedness, the results that hold for finite-mean models in these applications are flipped. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.14851 [pdf, other]

Graph and Sequential Neural Networks in Session-based Recommendation: A Survey

Authors: Zihao Li, Chao Yang, Yakun Chen, Xianzhi Wang, Hongxu Chen, Guandong Xu, Lina Yao, Quan Z. Sheng

Abstract: Recent years have witnessed the remarkable success of recommendation systems (RSs) in alleviating the information overload problem. As a new paradigm of RSs, session-based recommendation (SR) specializes in users' short-term preference capture and aims to provide a more dynamic and timely recommendation based on the ongoing interacted actions. In this survey, we will give a comprehensive overview… ▽ More Recent years have witnessed the remarkable success of recommendation systems (RSs) in alleviating the information overload problem. As a new paradigm of RSs, session-based recommendation (SR) specializes in users' short-term preference capture and aims to provide a more dynamic and timely recommendation based on the ongoing interacted actions. In this survey, we will give a comprehensive overview of the recent works on SR. First, we clarify the definitions of various SR tasks and introduce the characteristics of session-based recommendation against other recommendation tasks. Then, we summarize the existing methods in two categories: sequential neural network based methods and graph neural network (GNN) based methods. The standard frameworks and technical are also introduced. Finally, we discuss the challenges of SR and new research directions in this area. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.14821 [pdf, other]

Data-driven Effective Modeling of Multiscale Stochastic Dynamical Systems

Authors: Yuan Chen, Dongbin Xiu

Abstract: We present a numerical method for learning the dynamics of slow components of unknown multiscale stochastic dynamical systems. While the governing equations of the systems are unknown, bursts of observation data of the slow variables are available. By utilizing the observation data, our proposed method is capable of constructing a generative stochastic model that can accurately capture the effecti… ▽ More We present a numerical method for learning the dynamics of slow components of unknown multiscale stochastic dynamical systems. While the governing equations of the systems are unknown, bursts of observation data of the slow variables are available. By utilizing the observation data, our proposed method is capable of constructing a generative stochastic model that can accurately capture the effective dynamics of the slow variables in distribution. We present a comprehensive set of numerical examples to demonstrate the performance of the proposed method. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2406.15747

MSC Class: 60H10; 60H35; 62M45; 65C30

arXiv:2408.14721 [pdf, other]

PAT: Pruning-Aware Tuning for Large Language Models

Authors: Yijiang Liu, Huanrui Yang, Youxin Chen, Rongyu Zhang, Miao Wang, Yuan Du, Li Du

Abstract: Large language models (LLMs) excel in language tasks, especially with supervised fine-tuning after pre-training. However, their substantial memory and computational requirements hinder practical applications. Structural pruning, which reduces less significant weight dimensions, is one solution. Yet, traditional post-hoc pruning often leads to significant performance loss, with limited recovery fro… ▽ More Large language models (LLMs) excel in language tasks, especially with supervised fine-tuning after pre-training. However, their substantial memory and computational requirements hinder practical applications. Structural pruning, which reduces less significant weight dimensions, is one solution. Yet, traditional post-hoc pruning often leads to significant performance loss, with limited recovery from further fine-tuning due to reduced capacity. Since the model fine-tuning refines the general and chaotic knowledge in pre-trained models, we aim to incorporate structural pruning with the fine-tuning, and propose the Pruning-Aware Tuning (PAT) paradigm to eliminate model redundancy while preserving the model performance to the maximum extend. Specifically, we insert the innovative Hybrid Sparsification Modules (HSMs) between the Attention and FFN components to accordingly sparsify the upstream and downstream linear modules. The HSM comprises a lightweight operator and a globally shared trainable mask. The lightweight operator maintains a training overhead comparable to that of LoRA, while the trainable mask unifies the channels to be sparsified, ensuring structural pruning. Additionally, we propose the Identity Loss which decouples the transformation and scaling properties of the HSMs to enhance training robustness. Extensive experiments demonstrate that PAT excels in both performance and efficiency. For example, our Llama2-7b model with a 25\% pruning ratio achieves 1.33$\times$ speedup while outperforming the LoRA-finetuned model by up to 1.26\% in accuracy with a similar training cost. Code: https://github.com/kriskrisliu/PAT_Pruning-Aware-Tuning △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.14647 [pdf, other]

doi 10.3847/1538-4357/ad70b6

KBSS-InCLOSE I: Design and First Results from the Inner CGM of QSO Line Of Sight Emitting Galaxies at z~2-3

Authors: Evan Haze Nunez, Charles C. Steidel, Evan N. Kirby, Gwen C. Rudie, Nikolaus Z. Prusinski, Yuguang Chen, Zhuyun Zhuang, Allison L. Strom, Dawn K. Erb, Max Pettini, Louise Welsh, Dave S. N. Rupke, Ryan J. Cooke

Abstract: We present the design and first results of the Inner Circumgalactic Medium (CGM) of QSO Line of Sight Emitting galaxies at $z\sim 2-3$, KBSS-InCLOSE. The survey will connect galaxy properties (e.g., stellar mass $M_*$, interstellar medium ISM metallicity) with the physical conditions of the inner CGM (e.g., kinematics, metallicity) to directly observe the galaxy-scale baryon cycle. We obtain deep… ▽ More We present the design and first results of the Inner Circumgalactic Medium (CGM) of QSO Line of Sight Emitting galaxies at $z\sim 2-3$, KBSS-InCLOSE. The survey will connect galaxy properties (e.g., stellar mass $M_*$, interstellar medium ISM metallicity) with the physical conditions of the inner CGM (e.g., kinematics, metallicity) to directly observe the galaxy-scale baryon cycle. We obtain deep Keck/KCWI optical IFU pointings of Keck Baryonic Structure Survey (KBSS) QSOs to discover new star-forming galaxies at small projected distances $b\lesssim12"$ (98 kpc, $\overline{z}=2.3$), then obtain follow-up Keck/MOSFIRE NIR spectra to confirm their redshifts. We leverage KBSS images and Keck/HIRES QSO spectra to model stellar populations and inner CGM absorption. In this paper, we analyze two QSO fields and discover more than 15 new galaxies with KCWI, then use MOSFIRE for two galaxies Q2343-G1 ($z=2.43$; G1) and Q2233-N1 ($z=3.15$; N1), which are both associated with Damped Lyman Alpha absorbers. We find that G1 has typical $M_*$,UV/optical emission properties. N1 has lower $M_*$ with very strong nebular emission. We jointly analyze neutral phase CGM and ionized ISM in N/O (for the first time at this $z$), dust extinction, and high-ionization CGM finding that: G1's CGM is metal poor and less evolved than its ISM, while N1's CGM and ISM abundances are comparable; their CGM shows $\sim1$ dex less dust extinction than the ISM; and G1's CGM has direct evidence of hot, metal-rich galactic outflow ejecta. These findings support that metals and dust are driven into the CGM from outflows, but may also be e.g., stripped ISM gas or satellite enrichment. The full KBSS-InCLOSE sample will explore these scenarios. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: 37 pages (48 total), 14 figures (20 total), Accepted to ApJ

arXiv:2408.14610 [pdf, other]

The Impact of Group Discussion and Formation on Student Performance: An Experience Report in a Large CS1 Course

Authors: Tong Wu, Xiaohang Tang, Sam Wong, Xi Chen, Clifford A. Shaffer, Yan Chen

Abstract: Programming instructors often conduct collaborative learning activities, such as Peer Instruction (PI), to enhance student motivation, engagement, and learning gains. However, the impact of group discussion and formation mechanisms on student performance remains unclear. To investigate this, we conducted an 11-session experiment in a large, in-person CS1 course. We employed both random and experti… ▽ More Programming instructors often conduct collaborative learning activities, such as Peer Instruction (PI), to enhance student motivation, engagement, and learning gains. However, the impact of group discussion and formation mechanisms on student performance remains unclear. To investigate this, we conducted an 11-session experiment in a large, in-person CS1 course. We employed both random and expertise-balanced grouping methods to examine the efficacy of different group mechanisms and the impact of expert students' presence on collaborative learning. Our observations revealed complex dynamics within the collaborative learning environment. Among 255 groups, 146 actively engaged in discussions, with 96 of these groups demonstrating improvement for poor-performing students. Interestingly, our analysis revealed that different grouping methods (expertise-balanced or random) did not significantly influence discussion engagement or poor-performing students' improvement. In our deeper qualitative analysis, we found that struggling students often derived benefits from interactions with expert peers, but this positive effect was not consistent across all groups. We identified challenges that expert students face in peer instruction interactions, highlighting the complexity of leveraging expertise within group discussions. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.14156 [pdf, other]

Integrated Sensing, Communication, and Powering over Multi-antenna OFDM Systems

Authors: Yilong Chen, Chao Hu, Zixiang Ren, Han Hu, Jie Xu, Lexi Xu, Lei Liu, Shuguang Cui

Abstract: This paper considers a multi-functional orthogonal frequency division multiplexing (OFDM) system with integrated sensing, communication, and powering (ISCAP), in which a multi-antenna base station (BS) transmits OFDM signals to simultaneously deliver information to multiple information receivers (IRs), provide energy supply to multiple energy receivers (ERs), and sense potential targets based on t… ▽ More This paper considers a multi-functional orthogonal frequency division multiplexing (OFDM) system with integrated sensing, communication, and powering (ISCAP), in which a multi-antenna base station (BS) transmits OFDM signals to simultaneously deliver information to multiple information receivers (IRs), provide energy supply to multiple energy receivers (ERs), and sense potential targets based on the echo signals. To facilitate ISCAP, the BS employs the joint transmit beamforming design by sending dedicated sensing/energy beams jointly with information beams. Furthermore, we consider the beam scanning for sensing, in which the joint beams scan in different directions over time to sense potential targets. In order to ensure the sensing beam scanning performance and meet the communication and powering requirements, it is essential to properly schedule IRs and ERs and design the resource allocation over time, frequency, and space. More specifically, we optimize the joint transmit beamforming over multiple OFDM symbols and subcarriers, with the objective of minimizing the average beampattern matching error of beam scanning for sensing, subject to the constraints on the average communication rates at IRs and the average harvested power at ERs. We find converged high-quality solutions to the formulated problem by proposing efficient iterative algorithms based on advanced optimization techniques. We also develop various heuristic designs based on the principles of zero-forcing (ZF) beamforming, round-robin user scheduling, and time switching, respectively. Numerical results show that our proposed algorithms adaptively generate information and sensing/energy beams at each time-frequency slot to match the scheduled IRs/ERs with the desired scanning beam, significantly outperforming the heuristic designs. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: 13 pages, 12 figures

arXiv:2408.14087 [pdf, other]

LSM-YOLO: A Compact and Effective ROI Detector for Medical Detection

Authors: Zhongwen Yu, Qiu Guan, Jianmin Yang, Zhiqiang Yang, Qianwei Zhou, Yang Chen, Feng Chen

Abstract: In existing medical Region of Interest (ROI) detection, there lacks an algorithm that can simultaneously satisfy both real-time performance and accuracy, not meeting the growing demand for automatic detection in medicine. Although the basic YOLO framework ensures real-time detection due to its fast speed, it still faces challenges in maintaining precision concurrently. To alleviate the above probl… ▽ More In existing medical Region of Interest (ROI) detection, there lacks an algorithm that can simultaneously satisfy both real-time performance and accuracy, not meeting the growing demand for automatic detection in medicine. Although the basic YOLO framework ensures real-time detection due to its fast speed, it still faces challenges in maintaining precision concurrently. To alleviate the above problems, we propose a novel model named Lightweight Shunt Matching-YOLO (LSM-YOLO), with Lightweight Adaptive Extraction (LAE) and Multipath Shunt Feature Matching (MSFM). Firstly, by using LAE to refine feature extraction, the model can obtain more contextual information and high-resolution details from multiscale feature maps, thereby extracting detailed features of ROI in medical images while reducing the influence of noise. Secondly, MSFM is utilized to further refine the fusion of high-level semantic features and low-level visual features, enabling better fusion between ROI features and neighboring features, thereby improving the detection rate for better diagnostic assistance. Experimental results demonstrate that LSM-YOLO achieves 48.6% AP on a private dataset of pancreatic tumors, 65.1% AP on the BCCD blood cell detection public dataset, and 73.0% AP on the Br35h brain tumor detection public dataset. Our model achieves state-of-the-art performance with minimal parameter cost on the above three datasets. The source codes are at: https://github.com/VincentYuuuuuu/LSM-YOLO. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.13712 [pdf, other]

Riemann-based Multi-scale Attention Reasoning Network for Text-3D Retrieval

Authors: Wenrui Li, Wei Han, Yandu Chen, Yeyu Chai, Yidan Lu, Xingtao Wang, Xiaopeng Fan

Abstract: Due to the challenges in acquiring paired Text-3D data and the inherent irregularity of 3D data structures, combined representation learning of 3D point clouds and text remains unexplored. In this paper, we propose a novel Riemann-based Multi-scale Attention Reasoning Network (RMARN) for text-3D retrieval. Specifically, the extracted text and point cloud features are refined by their respective Ad… ▽ More Due to the challenges in acquiring paired Text-3D data and the inherent irregularity of 3D data structures, combined representation learning of 3D point clouds and text remains unexplored. In this paper, we propose a novel Riemann-based Multi-scale Attention Reasoning Network (RMARN) for text-3D retrieval. Specifically, the extracted text and point cloud features are refined by their respective Adaptive Feature Refiner (AFR). Furthermore, we introduce the innovative Riemann Local Similarity (RLS) module and the Global Pooling Similarity (GPS) module. However, as 3D point cloud data and text data often possess complex geometric structures in high-dimensional space, the proposed RLS employs a novel Riemann Attention Mechanism to reflect the intrinsic geometric relationships of the data. Without explicitly defining the manifold, RMARN learns the manifold parameters to better represent the distances between text-point cloud samples. To address the challenges of lacking paired text-3D data, we have created the large-scale Text-3D Retrieval dataset T3DR-HIT, which comprises over 3,380 pairs of text and point cloud data. T3DR-HIT contains coarse-grained indoor 3D scenes and fine-grained Chinese artifact scenes, consisting of 1,380 and over 2,000 text-3D pairs, respectively. Experiments on our custom datasets demonstrate the superior performance of the proposed method. Our code and proposed datasets are available at \url{https://github.com/liwrui/RMARN}. △ Less

Submitted 24 August, 2024; originally announced August 2024.

arXiv:2408.13697 [pdf, other]

Guided and Fused: Efficient Frozen CLIP-ViT with Feature Guidance and Multi-Stage Feature Fusion for Generalizable Deepfake Detection

Authors: Yingjian Chen, Lei Zhang, Yakun Niu, Pei Chen, Lei Tan, Jing Zhou

Abstract: The rise of generative models has sparked concerns about image authenticity online, highlighting the urgent need for an effective and general detector. Recent methods leveraging the frozen pre-trained CLIP-ViT model have made great progress in deepfake detection. However, these models often rely on visual-general features directly extracted by the frozen network, which contain excessive informatio… ▽ More The rise of generative models has sparked concerns about image authenticity online, highlighting the urgent need for an effective and general detector. Recent methods leveraging the frozen pre-trained CLIP-ViT model have made great progress in deepfake detection. However, these models often rely on visual-general features directly extracted by the frozen network, which contain excessive information irrelevant to the task, resulting in limited detection performance. To address this limitation, in this paper, we propose an efficient Guided and Fused Frozen CLIP-ViT (GFF), which integrates two simple yet effective modules. The Deepfake-Specific Feature Guidance Module (DFGM) guides the frozen pre-trained model in extracting features specifically for deepfake detection, reducing irrelevant information while preserving its generalization capabilities. The Multi-Stage Fusion Module (FuseFormer) captures low-level and high-level information by fusing features extracted from each stage of the ViT. This dual-module approach significantly improves deepfake detection by fully leveraging CLIP-ViT's inherent advantages. Extensive experiments demonstrate the effectiveness and generalization ability of GFF, which achieves state-of-the-art performance with optimal results in only 5 training epochs. Even when trained on only 4 classes of ProGAN, GFF achieves nearly 99% accuracy on unseen GANs and maintains an impressive 97% accuracy on unseen diffusion models. △ Less

Submitted 24 August, 2024; originally announced August 2024.

arXiv:2408.13687 [pdf, other]

Quantum error correction below the surface code threshold

Authors: Rajeev Acharya, Laleh Aghababaie-Beni, Igor Aleiner, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Nikita Astrakhantsev, Juan Atalaya, Ryan Babbush, Dave Bacon, Brian Ballard, Joseph C. Bardin, Johannes Bausch, Andreas Bengtsson, Alexander Bilmes, Sam Blackwell, Sergio Boixo, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, David A. Browne , et al. (224 additional authors not shown)

Abstract: Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit, where the logical error rate is suppressed exponentially as more qubits are added. However, this exponential suppression only occurs if the physical error rate is below a critical threshold. In this work, we present two surface code memories operating below this… ▽ More Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit, where the logical error rate is suppressed exponentially as more qubits are added. However, this exponential suppression only occurs if the physical error rate is below a critical threshold. In this work, we present two surface code memories operating below this threshold: a distance-7 code and a distance-5 code integrated with a real-time decoder. The logical error rate of our larger quantum memory is suppressed by a factor of $Λ$ = 2.14 $\pm$ 0.02 when increasing the code distance by two, culminating in a 101-qubit distance-7 code with 0.143% $\pm$ 0.003% error per cycle of error correction. This logical memory is also beyond break-even, exceeding its best physical qubit's lifetime by a factor of 2.4 $\pm$ 0.3. We maintain below-threshold performance when decoding in real time, achieving an average decoder latency of 63 $μ$s at distance-5 up to a million cycles, with a cycle time of 1.1 $μ$s. To probe the limits of our error-correction performance, we run repetition codes up to distance-29 and find that logical performance is limited by rare correlated error events occurring approximately once every hour, or 3 $\times$ 10$^9$ cycles. Our results present device performance that, if scaled, could realize the operational requirements of large scale fault-tolerant quantum algorithms. △ Less

Submitted 24 August, 2024; originally announced August 2024.

Comments: 10 pages, 4 figures, Supplementary Information

arXiv:2408.13432 [pdf, other]

Integrating Multi-Head Convolutional Encoders with Cross-Attention for Improved SPARQL Query Translation

Authors: Yi-Hui Chen, Eric Jui-Lin Lu, Kwan-Ho Cheng

Abstract: The main task of the KGQA system (Knowledge Graph Question Answering) is to convert user input questions into query syntax (such as SPARQL). With the rise of modern popular encoders and decoders like Transformer and ConvS2S, many scholars have shifted the research direction of SPARQL generation to the Neural Machine Translation (NMT) architecture or the generative AI field of Text-to-SPARQL. In NM… ▽ More The main task of the KGQA system (Knowledge Graph Question Answering) is to convert user input questions into query syntax (such as SPARQL). With the rise of modern popular encoders and decoders like Transformer and ConvS2S, many scholars have shifted the research direction of SPARQL generation to the Neural Machine Translation (NMT) architecture or the generative AI field of Text-to-SPARQL. In NMT-based QA systems, the system treats knowledge base query syntax as a language. It uses NMT-based translation models to translate natural language questions into query syntax. Scholars use popular architectures equipped with cross-attention, such as Transformer, ConvS2S, and BiLSTM, to train translation models for query syntax. To achieve better query results, this paper improved the ConvS2S encoder and added multi-head attention from the Transformer, proposing a Multi-Head Conv encoder (MHC encoder) based on the n-gram language model. The principle is to use convolutional layers to capture local hidden features in the input sequence with different receptive fields, using multi-head attention to calculate dependencies between them. Ultimately, we found that the translation model based on the Multi-Head Conv encoder achieved better performance than other encoders, obtaining 76.52\% and 83.37\% BLEU-1 (BiLingual Evaluation Understudy) on the QALD-9 and LC-QuAD-1.0 datasets, respectively. Additionally, in the end-to-end system experiments on the QALD-9 and LC-QuAD-1.0 datasets, we achieved leading results over other KGQA systems, with Macro F1-measures reaching 52\% and 66\%, respectively. Moreover, the experimental results show that with limited computational resources, if one possesses an excellent encoder-decoder architecture and cross-attention, experts and scholars can achieve outstanding performance equivalent to large pre-trained models using only general embeddings. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: 24 pages, 20 figures, using the engrXiv template; the full version has been submitted to ACM Transactions on Information Systems and is currently under review. (2024)

arXiv:2408.13423 [pdf, other]

Training-free Long Video Generation with Chain of Diffusion Model Experts

Authors: Wenhao Li, Yichao Cao, Xiu Su, Xi Lin, Shan You, Mingkai Zheng, Yi Chen, Chang Xu

Abstract: Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{… ▽ More Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{con}trol and spatial-temporal re\textbf{fine}ment. It can generate high-quality videos with chain of off-the-shelf diffusion model experts, each expert responsible for a decoupled subtask. During the refinement, we introduce coordinated denoising, which can merge multiple diffusion experts' capabilities into a single sampling. Furthermore, we design ConFiner-Long framework, which can generate long coherent video with three constraint strategies on ConFiner. Experimental results indicate that with only 10\% of the inference cost, our ConFiner surpasses representative models like Lavie and Modelscope across all objective and subjective metrics. And ConFiner-Long can generate high-quality and coherent videos with up to 600 frames. △ Less

Submitted 2 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.13413 [pdf, other]

TVG: A Training-free Transition Video Generation Method with Diffusion Models

Authors: Rui Zhang, Yaosen Chen, Yuegen Liu, Wei Wang, Xuming Wen, Hongxia Wang

Abstract: Transition videos play a crucial role in media production, enhancing the flow and coherence of visual narratives. Traditional methods like morphing often lack artistic appeal and require specialized skills, limiting their effectiveness. Recent advances in diffusion model-based video generation offer new possibilities for creating transitions but face challenges such as poor inter-frame relationshi… ▽ More Transition videos play a crucial role in media production, enhancing the flow and coherence of visual narratives. Traditional methods like morphing often lack artistic appeal and require specialized skills, limiting their effectiveness. Recent advances in diffusion model-based video generation offer new possibilities for creating transitions but face challenges such as poor inter-frame relationship modeling and abrupt content changes. We propose a novel training-free Transition Video Generation (TVG) approach using video-level diffusion models that addresses these limitations without additional training. Our method leverages Gaussian Process Regression ($\mathcal{GPR}$) to model latent representations, ensuring smooth and dynamic transitions between frames. Additionally, we introduce interpolation-based conditional controls and a Frequency-aware Bidirectional Fusion (FBiF) architecture to enhance temporal control and transition reliability. Evaluations of benchmark datasets and custom image pairs demonstrate the effectiveness of our approach in generating high-quality smooth transition videos. The code are provided in https://sobeymil.github.io/tvg.com. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.13115 [pdf, ps, other]

Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias

Authors: Yifan Chen, Xiaoou Cheng, Jonathan Niles-Weed, Jonathan Weare

Abstract: The unadjusted Langevin algorithm is commonly used to sample probability distributions in extremely high-dimensional settings. However, existing analyses of the algorithm for strongly log-concave distributions suggest that, as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error in the $W_2$ metric scales in proportion to $d$ or… ▽ More The unadjusted Langevin algorithm is commonly used to sample probability distributions in extremely high-dimensional settings. However, existing analyses of the algorithm for strongly log-concave distributions suggest that, as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error in the $W_2$ metric scales in proportion to $d$ or $\sqrt{d}$. In this paper, we argue that, despite this poor scaling of the $W_2$ error for the full set of variables, the behavior for a small number of variables can be significantly better: a number of iterations proportional to $K$, up to logarithmic terms in $d$, often suffices for the algorithm to converge to within a desired $W_2$ error for all $K$-marginals. We refer to this effect as delocalization of bias. We show that the delocalization effect does not hold universally and prove its validity for Gaussian distributions and strongly log-concave distributions with certain sparse interactions. Our analysis relies on a novel $W_{2,\ell^\infty}$ metric to measure convergence. A key technical challenge we address is the lack of a one-step contraction property in this metric. Finally, we use asymptotic arguments to explore potential generalizations of the delocalization effect beyond the Gaussian and sparse interactions setting. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.13036 [pdf, other]

S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points

Authors: Bing He, Yunuo Chen, Guo Lu, Li Song, Wenjun Zhang

Abstract: Recently, the dynamic scene reconstruction using Gaussians has garnered increased interest. Mainstream approaches typically employ a global deformation field to warp a 3D scene in the canonical space. However, the inherently low-frequency nature of implicit neural fields often leads to ineffective representations of complex motions. Moreover, their structural rigidity can hinder adaptation to scen… ▽ More Recently, the dynamic scene reconstruction using Gaussians has garnered increased interest. Mainstream approaches typically employ a global deformation field to warp a 3D scene in the canonical space. However, the inherently low-frequency nature of implicit neural fields often leads to ineffective representations of complex motions. Moreover, their structural rigidity can hinder adaptation to scenes with varying resolutions and durations. To overcome these challenges, we introduce a novel approach utilizing discrete 3D control points. This method models local rays physically and establishes a motion-decoupling coordinate system, which effectively merges traditional graphics with learnable pipelines for a robust and efficient local 6-degrees-of-freedom (6-DoF) motion representation. Additionally, we have developed a generalized framework that incorporates our control points with Gaussians. Starting from an initial 3D reconstruction, our workflow decomposes the streaming 4D real-world reconstruction into four independent submodules: 3D segmentation, 3D control points generation, object-wise motion manipulation, and residual compensation. Our experiments demonstrate that this method outperforms existing state-of-the-art 4D Gaussian Splatting techniques on both the Neu3DV and CMU-Panoptic datasets. Our approach also significantly accelerates training, with the optimization of our 3D control points achievable within just 2 seconds per frame on a single NVIDIA 4070 GPU. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.13019 [pdf]

VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints

Authors: Jinghua Tang, Liyun Zhang, Yu Lu, Dian Ding, Lanqing Yang, YiChao Chen, Minjie Bian, Xiaoshan Li, Guangtao Xue

Abstract: Emotion recognition can enhance humanized machine responses to user commands, while voiceprint-based perception systems can be easily integrated into commonly used devices like smartphones and stereos. Despite having the largest number of speakers, there is a noticeable absence of high-quality corpus datasets for emotion recognition using Chinese voiceprints. Hence, this paper introduces the VCEMO… ▽ More Emotion recognition can enhance humanized machine responses to user commands, while voiceprint-based perception systems can be easily integrated into commonly used devices like smartphones and stereos. Despite having the largest number of speakers, there is a noticeable absence of high-quality corpus datasets for emotion recognition using Chinese voiceprints. Hence, this paper introduces the VCEMO dataset to address this deficiency. The proposed dataset is constructed from everyday conversations and comprises over 100 users and 7,747 textual samples. Furthermore, this paper proposes a multimodal-based model as a benchmark, which effectively fuses speech, text, and external knowledge using a co-attention structure. The system employs contrastive learning-based regulation for the uneven distribution of the dataset and the diversity of emotional expressions. The experiments demonstrate the significant improvement of the proposed model over SOTA on the VCEMO and IEMOCAP datasets. Code and dataset will be released for research. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: 12 pages, 4 figures

arXiv:2408.12817 [pdf, other]

Data-Driven Parametrization of Molecular Mechanics Force Fields for Expansive Chemical Space Coverage

Authors: Tianze Zheng, Ailun Wang, Xu Han, Yu Xia, Xingyuan Xu, Jiawei Zhan, Yu Liu, Yang Chen, Zhi Wang, Xiaojie Wu, Sheng Gong, Wen Yan

Abstract: A force field is a critical component in molecular dynamics simulations for computational drug discovery. It must achieve high accuracy within the constraints of molecular mechanics' (MM) limited functional forms, which offers high computational efficiency. With the rapid expansion of synthetically accessible chemical space, traditional look-up table approaches face significant challenges. In this… ▽ More A force field is a critical component in molecular dynamics simulations for computational drug discovery. It must achieve high accuracy within the constraints of molecular mechanics' (MM) limited functional forms, which offers high computational efficiency. With the rapid expansion of synthetically accessible chemical space, traditional look-up table approaches face significant challenges. In this study, we address this issue using a modern data-driven approach, developing ByteFF, an Amber-compatible force field for drug-like molecules. To create ByteFF, we generated an expansive and highly diverse molecular dataset at the B3LYP-D3(BJ)/DZVP level of theory. This dataset includes 2.4 million optimized molecular fragment geometries with analytical Hessian matrices, along with 3.2 million torsion profiles. We then trained an edge-augmented, symmetry-preserving molecular graph neural network (GNN) on this dataset, employing a carefully optimized training strategy. Our model predicts all bonded and non-bonded MM force field parameters for drug-like molecules simultaneously across a broad chemical space. ByteFF demonstrates state-of-the-art performance on various benchmark datasets, excelling in predicting relaxed geometries, torsional energy profiles, and conformational energies and forces. Its exceptional accuracy and expansive chemical space coverage make ByteFF a valuable tool for multiple stages of computational drug discovery. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: ByteFF, a machine learning parametrized MMFF

arXiv:2408.12809 [pdf, other]

DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation

Authors: Xiaowei Mao, Yan Lin, Shengnan Guo, Yubin Chen, Xingyu Xian, Haomin Wen, Qisen Xu, Youfang Lin, Huaiyu Wan

Abstract: Uncertainty quantification in travel time estimation (TTE) aims to estimate the confidence interval for travel time, given the origin (O), destination (D), and departure time (T). Accurately quantifying this uncertainty requires generating the most likely path and assessing travel time uncertainty along the path. This involves two main challenges: 1) Predicting a path that aligns with the ground t… ▽ More Uncertainty quantification in travel time estimation (TTE) aims to estimate the confidence interval for travel time, given the origin (O), destination (D), and departure time (T). Accurately quantifying this uncertainty requires generating the most likely path and assessing travel time uncertainty along the path. This involves two main challenges: 1) Predicting a path that aligns with the ground truth, and 2) modeling the impact of travel time in each segment on overall uncertainty under varying conditions. We propose DutyTTE to address these challenges. For the first challenge, we introduce a deep reinforcement learning method to improve alignment between the predicted path and the ground truth, providing more accurate travel time information from road segments to improve TTE. For the second challenge, we propose a mixture of experts guided uncertainty quantification mechanism to better capture travel time uncertainty for each segment under varying contexts. Additionally, we calibrate our results using Hoeffding's upper-confidence bound to provide statistical guarantees for the estimated confidence intervals. Extensive experiments on two real-world datasets demonstrate the superiority of our proposed method. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: 7 pages

arXiv:2408.12786 [pdf, other]

Broad-band X-ray spectral and timing properties of the accreting millisecond X-ray pulsar IGR J17498$-$2921 during the 2023 outburst

Authors: Zhaosheng Li, L. Kuiper, Y. Y. Pan, M. Falanga, J. Poutanen, Y. P. Chen, R. X. Xu, M. Y. Ge, Y. Huang, L. M. Song, S. Zhang, F. J. Lu, S. N. Zhang

Abstract: We report on the broadband spectral and timing properties of the accreting millisecond X-ray pulsar IGR J17498$-$2921 during its April 2023 outburst using data from NICER (1$-$10 keV), NuSTAR (3$-$79 keV), Insight-HXMT (2$-$150 keV), and INTEGRAL (30$-$150 keV). We detect significant 401 Hz pulsations across the 0.5$-$150 keV band. The pulse fraction increases from $\sim$2% at 1 keV to $\sim$13% a… ▽ More We report on the broadband spectral and timing properties of the accreting millisecond X-ray pulsar IGR J17498$-$2921 during its April 2023 outburst using data from NICER (1$-$10 keV), NuSTAR (3$-$79 keV), Insight-HXMT (2$-$150 keV), and INTEGRAL (30$-$150 keV). We detect significant 401 Hz pulsations across the 0.5$-$150 keV band. The pulse fraction increases from $\sim$2% at 1 keV to $\sim$13% at 66 keV. Five type-I X-ray bursts have been detected, including three photospheric radius expansion bursts, with a rise time of $\sim$2 s and an exponential decay time of $\sim$5 s. The recurrence time is $\sim$9.1 h, which can be explained by unstable thermonuclear burning of hydrogen-deficient material on the neutron star surface. The quasi-simultaneous 1$-$150 keV broadband spectra from NICER, NuSTAR, and INTEGRAL can be well fitted by an absorbed reflection model, relxillCp, and a Gaussian line of instrumental origin. The Comptonized emission from the hot corona is characterized by a photon index $Γ$ of $\sim$1.8 and an electron temperature $kT_{\rm e}$ of $\sim$40 keV. We obtain a low inclination angle $i\sim34^{\circ}$. The accretion disk shows properties of strong ionization, $\log(ξ/{\rm erg~cm~s^{-1}})\sim4.5$, over-solar abundance, $A_{\rm Fe}\sim 7.7$, and high density, $\log(n_{\rm e}/{\rm cm^{-3}})\sim 19.5$. However, a lower disk density with normal abundance and ionization could also be possible. From the inner disk radius $R_{\rm in}=1.67R_{\rm ISCO}$ and the long-term spin-down rate of $-3.1(2)\times10^{-15}~{\rm Hz~s^{-1}}$, we constrain the magnetic field of IGR J17498$-$2921 in the range of $(0.9-2.4)\times10^8$ G. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: 12 pages, 8 figures. This is a revised version resubmitted to A&A, incorporating the referee's comments

arXiv:2408.12725 [pdf, other]

DUNE Phase II: Scientific Opportunities, Detector Concepts, Technological Solutions

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, C. Andreopoulos, M. Andreotti , et al. (1347 additional authors not shown)

Abstract: The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I… ▽ More The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and Phase II, as did the European Strategy for Particle Physics. While the construction of the DUNE Phase I is well underway, this White Paper focuses on DUNE Phase II planning. DUNE Phase-II consists of a third and fourth far detector (FD) module, an upgraded near detector complex, and an enhanced 2.1 MW beam. The fourth FD module is conceived as a "Module of Opportunity", aimed at expanding the physics opportunities, in addition to supporting the core DUNE science program, with more advanced technologies. This document highlights the increased science opportunities offered by the DUNE Phase II near and far detectors, including long-baseline neutrino oscillation physics, neutrino astrophysics, and physics beyond the standard model. It describes the DUNE Phase II near and far detector technologies and detector design concepts that are currently under consideration. A summary of key R&D goals and prototyping phases needed to realize the Phase II detector technical designs is also provided. DUNE's Phase II detectors, along with the increased beam power, will complete the full scope of DUNE, enabling a multi-decadal program of groundbreaking science with neutrinos. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Report number: FERMILAB-TM-2833-LBNF

arXiv:2408.12641 [pdf, other]

Surrogate Constructed Scalable Circuits ADAPT-VQE in the Schwinger model

Authors: Erik Gustafson, Kyle Sherbert, Adrien Florio, Karunya Shirali, Yanzhu Chen, Henry Lamm, Semeon Valgushev, Andreas Weichselbaum, Sophia E. Economou, Robert D. Pisarski, Norm M. Tubman

Abstract: Inspired by recent advancements of simulating periodic systems on quantum computers, we develop a new approach, (SC)$^2$-ADAPT-VQE, to further advance the simulation of these systems. Our approach extends the scalable circuits ADAPT-VQE framework, which builds an ansatz from a pool of coordinate-invariant operators defined for arbitrarily large, though not arbitrarily small, volumes. Our method us… ▽ More Inspired by recent advancements of simulating periodic systems on quantum computers, we develop a new approach, (SC)$^2$-ADAPT-VQE, to further advance the simulation of these systems. Our approach extends the scalable circuits ADAPT-VQE framework, which builds an ansatz from a pool of coordinate-invariant operators defined for arbitrarily large, though not arbitrarily small, volumes. Our method uses a classically tractable ``Surrogate Constructed'' method to remove irrelevant operators from the pool, reducing the minimum size for which the scalable circuits are defined. Bringing together the scalable circuits and the surrogate constructed approaches forms the core of the (SC)$^2$ methodology. Our approach allows for a wider set of classical computations, on small volumes, which can be used for a more robust extrapolation protocol. While developed in the context of lattice models, the surrogate construction portion is applicable to a wide variety of problems where information about the relative importance of operators in the pool is available. As an example, we use it to compute properties of the Schwinger model - quantum electrodynamics for a single, massive fermion in $1+1$ dimensions - and show that our method can be used to accurately extrapolate to the continuum limit. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: 9 pages, 6 figures

Report number: FERMILAB-PUB-24-0456-SQMS-T

arXiv:2408.12523 [pdf, ps, other]

Weighted Envy-Freeness in House Allocation

Authors: Sijia Dai, Yankai Chen, Xiaowei Wu, Yicheng Xu, Yong Zhang

Abstract: The classic house allocation problem involves assigning $m$ houses to $n$ agents based on their utility functions, ensuring each agent receives exactly one house. A key criterion in these problems is satisfying fairness constraints such as envy-freeness. We extend this problem by considering agents with arbitrary weights, focusing on the concept of weighted envy-freeness, which has been extensivel… ▽ More The classic house allocation problem involves assigning $m$ houses to $n$ agents based on their utility functions, ensuring each agent receives exactly one house. A key criterion in these problems is satisfying fairness constraints such as envy-freeness. We extend this problem by considering agents with arbitrary weights, focusing on the concept of weighted envy-freeness, which has been extensively studied in fair division. We present a polynomial-time algorithm to determine whether weighted envy-free allocations exist and, if so, to compute one. Since weighted envy-free allocations do not always exist, we also investigate the potential of achieving such allocations through the use of subsidies. We provide several characterizations for weighted envy-freeable allocations (allocations that can be turned weighted envy-free by introducing subsidies) and show that they do not always exist, which is different from the unweighted setting. Furthermore, we explore the existence of weighted envy-freeable allocations in specific scenarios and outline the conditions under which they exist. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: 17 pages, 4 figures

arXiv:2408.12222 [pdf]

Formation mechanism of the (2 x 1) reconstruction of calcite (104)

Authors: Haojun Zhou, Yingquan Chen, Mingyue Ding, Xiaoliang Zhong

Abstract: Calcite has recently attracted extensive research interest in fields ranging from geoscience to carbon dioxide removal. Although much effort has been made to study the (2x1) reconstruction of the most stable (104) surface, the origin of this reconstruction remains unclear. Here, we carefully investigate the atomic and electronic structures of calcite (104) via density functional theory methods wit… ▽ More Calcite has recently attracted extensive research interest in fields ranging from geoscience to carbon dioxide removal. Although much effort has been made to study the (2x1) reconstruction of the most stable (104) surface, the origin of this reconstruction remains unclear. Here, we carefully investigate the atomic and electronic structures of calcite (104) via density functional theory methods with van der Waals corrections. The results unambiguously show that the driving force for this reconstruction is the intrinsic demands of surface atoms to increase the coordination numbers. On reconstructing, calcite (104) forms four additional Ca-O bonds per (2x1) unit cell. Besides, phonon spectrums indicate both unreconstructed and reconstructed surfaces are dynamically stable. Finally, by applying the climbing image nudged elastic band method, an energy barrier is predicted during the reconstructing. This work delivers a full picture for the formation of calcite (104)-(2x1) reconstruction and can greatly advance the understanding of surface science for calcite. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.12192 [pdf, other]

A framework for extracting the rates of photophysical processes from biexponentially decaying photon emission data

Authors: Jill M. Cleveland, Tory A. Welsch, Eric Y. Chen, D. Bruce Chase, Matthew F. Doty, Hanz Y. Ramírez-Gómez

Abstract: There is strong interest in designing and realizing optically-active semiconductor nanostructures of greater complexity for applications in fields ranging from biomedical engineering to quantum computing. While these increasingly complex nanostructures can implement progressively sophisticated optical functions, the presence of more material constituents and interfaces also leads to increasingly c… ▽ More There is strong interest in designing and realizing optically-active semiconductor nanostructures of greater complexity for applications in fields ranging from biomedical engineering to quantum computing. While these increasingly complex nanostructures can implement progressively sophisticated optical functions, the presence of more material constituents and interfaces also leads to increasingly complex exciton dynamics. In particular, the rates of carrier trapping and detrapping in complex heterostructures are critically important for advanced optical functionality, but they can rarely be directly measured. In this work, we develop a model that includes trapping and release of carriers by optically inactive states. The model explains the widely observed biexponential decay of the photoluminescence signal from neutral excitons in low dimensional semiconductor emitters. The model also allows determination of likelihood intervals for all the transition rates involved in the emission dynamics, without the use of approximations. Furthermore, in cases for which the high temperature limit is suitable, the model leads to specific values of such rates, outperforming reduced models previously used to estimate those quantities. We demonstrate the value of this model by applying it to time resolved photoluminescence measurements of CdSeTe/CdS heterostructures. We obtain values not only for the radiative and nonradiative lifetimes, but also for the delayed photoluminescence originating in trapping and release. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.12133 [pdf, other]

Self-supervised Learning for Geospatial AI: A Survey

Authors: Yile Chen, Weiming Huang, Kaiqi Zhao, Yue Jiang, Gao Cong

Abstract: The proliferation of geospatial data in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across various urban applications. Given the vast yet inherently sparse labeled nature of geospatial data, there is a critical need for techniques that can effectively leverage such data without heavy reliance on labeled datasets. Th… ▽ More The proliferation of geospatial data in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across various urban applications. Given the vast yet inherently sparse labeled nature of geospatial data, there is a critical need for techniques that can effectively leverage such data without heavy reliance on labeled datasets. This requirement aligns with the principles of self-supervised learning (SSL), which has attracted increasing attention for its adoption in geospatial data. This paper conducts a comprehensive and up-to-date survey of SSL techniques applied to or developed for three primary data (geometric) types prevalent in geospatial vector data: points, polylines, and polygons. We systematically categorize various SSL techniques into predictive and contrastive methods, discussing their application with respect to each data type in enhancing generalization across various downstream tasks. Furthermore, we review the emerging trends of SSL for GeoAI, and several task-specific SSL techniques. Finally, we discuss several key challenges in the current research and outline promising directions for future investigation. By presenting a structured analysis of relevant studies, this paper aims to inspire continued advancements in the integration of SSL with GeoAI, encouraging innovative methods to harnessing the power of geospatial data. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.12102 [pdf, other]

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization

Authors: Luyao Cheng, Hui Wang, Siqi Zheng, Yafeng Chen, Rongjie Huang, Qinglin Zhang, Qian Chen, Xihao Li

Abstract: Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing speaker diarization systems rely exclusively on unimodal acoustic information, making the task particularly challenging due to the innate ambiguities of audio signals… ▽ More Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing speaker diarization systems rely exclusively on unimodal acoustic information, making the task particularly challenging due to the innate ambiguities of audio signals. Recent studies have made tremendous efforts towards audio-visual or audio-semantic modeling to enhance performance. However, even the incorporation of up to two modalities often falls short in addressing the complexities of spontaneous and unstructured conversations. To exploit more meaningful dialogue patterns, we propose a novel multimodal approach that jointly utilizes audio, visual, and semantic cues to enhance speaker diarization. Our method elegantly formulates the multimodal modeling as a constrained optimization problem. First, we build insights into the visual connections among active speakers and the semantic interactions within spoken content, thereby establishing abundant pairwise constraints. Then we introduce a joint pairwise constraint propagation algorithm to cluster speakers based on these visual and semantic constraints. This integration effectively leverages the complementary strengths of different modalities, refining the affinity estimation between individual speaker embeddings. Extensive experiments conducted on multiple multimodal datasets demonstrate that our approach consistently outperforms state-of-the-art speaker diarization methods. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.12063 [pdf, other]

A Deconfounding Approach to Climate Model Bias Correction

Authors: Wentao Gao, Jiuyong Li, Debo Cheng, Lin Liu, Jixue Liu, Thuc Duy Le, Xiaojing Du, Xiongren Chen, Yanchang Zhao, Yun Chen

Abstract: Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglec… ▽ More Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglect unobserved confounders, leading to biased results. This paper proposes a novel bias correction approach to utilize both GCM and observational data to learn a factor model that captures multi-cause latent confounders. Inspired by recent advances in causality based time series deconfounding, our method first constructs a factor model to learn latent confounders from historical data and then applies them to enhance the bias correction process using advanced time series forecasting models. The experimental results demonstrate significant improvements in the accuracy of precipitation outputs. By addressing unobserved confounders, our approach offers a robust and theoretically grounded solution for climate model bias correction. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.12017 [pdf, other]

Radiation Hydrodynamic Simulations of Massive Stars in Gas-rich Environments: Accretion of AGN Stars Suppressed By Thermal Feedback

Authors: Yi-Xian Chen, Yan-Fei Jiang, Jeremy Goodman, Douglas N. C. Lin

Abstract: Massive stars may form in or be captured into AGN disks. Recent 1D studies employing stellar-evolution codes have demonstrated the potential for rapid growth of such stars through accretion up to a few hundred $M_\odot$. We perform 3D radiation hydrodynamic simulations of moderately massive stars' envelopes, in order to determine the rate and critical radius $R_{\rm crit}$ of their accretion proce… ▽ More Massive stars may form in or be captured into AGN disks. Recent 1D studies employing stellar-evolution codes have demonstrated the potential for rapid growth of such stars through accretion up to a few hundred $M_\odot$. We perform 3D radiation hydrodynamic simulations of moderately massive stars' envelopes, in order to determine the rate and critical radius $R_{\rm crit}$ of their accretion process in an isotropic gas-rich environment in the absence of luminosity-driven mass loss. We find that in the ``fast-diffusion" regime where characteristic radiative diffusion speed $c/τ$ is faster than the gas sound speed $c_s$, the accretion rate is suppressed by feedback from gravitational and radiative advection energy flux, in addition to the stellar luminosity. Alternatively, in the ``slow-diffusion" regime where $c/τ<c_s$, due to adiabatic accretion, the stellar envelope expands quickly to become hydrostatic and further net accretion occurs on thermal timescales in the absence of self-gravity. When the radiation entropy of the medium is less than that of the star, however, this hydrostatic envelope can become more massive than the star itself. Within this sub-regime, self-gravity of the envelope excites runaway growth. Applying our results to realistic environments, moderately massive stars ($\lesssim 100M_\odot$) embedded in AGN disks typically accrete in the fast-diffusion regime, leading to reduction of steady-state accretion rate 1-2 orders of magnitudes lower than expected by previous 1D calculations and $R_{\rm crit}$ smaller than the disk scale height, except in the opacity window at temperature $T\sim 2000$K. Accretion in slow diffusion regime occurs in regions with very high density $ρ\gtrsim 10^{-9}$g/cm$^3$, and needs to be treated with caution in 1D long-term calculations. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: Accepted to ApJ, 20 pages, 16 figures

arXiv:2408.11998 [pdf, ps, other]

On the third kind periods for abelian $t$-modules

Authors: Yen-Tsung Chen, Changningphaabi Namoijam

Abstract: Inspired by the relations between periods of elliptic integrals of the third kind and the periods of the extensions of the corresponding elliptic curves by the multiplicative group, we introduce the notion of the third kind periods for abelian $t$-modules and establish an evaluation for these periods that is parallel to the classical setting. When we specialize our result to the case of Drinfeld m… ▽ More Inspired by the relations between periods of elliptic integrals of the third kind and the periods of the extensions of the corresponding elliptic curves by the multiplicative group, we introduce the notion of the third kind periods for abelian $t$-modules and establish an evaluation for these periods that is parallel to the classical setting. When we specialize our result to the case of Drinfeld modules, an explicit formula for these third kind periods is established. We also prove the algebraic independence of periods of the first, the second, and the third kind for Drinfeld modules of arbitrary rank. This generalizes prior results of Chang for rank $2$ Drinfeld modules. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 36 pages

MSC Class: 11J93 (Primary); 11G09; 33E50 (Secondary)

arXiv:2408.11749 [pdf, other]

Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks

Authors: Yiyi Chen, Russa Biswas, Heather Lent, Johannes Bjerva

Abstract: Large Language Models (LLMs) are susceptible to malicious influence by cyber attackers through intrusions such as adversarial, backdoor, and embedding inversion attacks. In response, the burgeoning field of LLM Security aims to study and defend against such threats. Thus far, the majority of works in this area have focused on monolingual English models, however, emerging research suggests that mul… ▽ More Large Language Models (LLMs) are susceptible to malicious influence by cyber attackers through intrusions such as adversarial, backdoor, and embedding inversion attacks. In response, the burgeoning field of LLM Security aims to study and defend against such threats. Thus far, the majority of works in this area have focused on monolingual English models, however, emerging research suggests that multilingual LLMs may be more vulnerable to various attacks than their monolingual counterparts. While previous work has investigated embedding inversion over a small subset of European languages, it is challenging to extrapolate these findings to languages from different linguistic families and with differing scripts. To this end, we explore the security of multilingual LLMs in the context of embedding inversion attacks and investigate cross-lingual and cross-script inversion across 20 languages, spanning over 8 language families and 12 scripts. Our findings indicate that languages written in Arabic script and Cyrillic script are particularly vulnerable to embedding inversion, as are languages within the Indo-Aryan language family. We further observe that inversion models tend to suffer from language confusion, sometimes greatly reducing the efficacy of an attack. Accordingly, we systematically explore this bottleneck for inversion models, uncovering predictable patterns which could be leveraged by attackers. Ultimately, this study aims to further the field's understanding of the outstanding security vulnerabilities facing multilingual LLMs and raise awareness for the languages most at risk of negative impact from these attacks. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 11 pages, 4 figures, 7 tables

arXiv:2408.11738 [pdf, ps, other]

Existence of complements for foliations

Authors: Yen-An Chen, Dongchen Jiao, Pascale Voegtli

Abstract: This paper demonstrates the existence of $\mathbb{Q}$-complements for algebraically integrable log-Fano foliations on klt ambient varieties. Additionally, we investigate properties of algebraically integrable Fano foliations such as a partial inversion of adjunction as well as a connectedness principle. This paper demonstrates the existence of $\mathbb{Q}$-complements for algebraically integrable log-Fano foliations on klt ambient varieties. Additionally, we investigate properties of algebraically integrable Fano foliations such as a partial inversion of adjunction as well as a connectedness principle. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 12 pages. Comments are welcomed

arXiv:2408.11727 [pdf, other]

Efficient Detection of Toxic Prompts in Large Language Models

Authors: Yi Liu, Junzhe Yu, Huijia Sun, Ling Shi, Gelei Deng, Yuqi Chen, Yang Liu

Abstract: Large language models (LLMs) like ChatGPT and Gemini have significantly advanced natural language processing, enabling various applications such as chatbots and automated content generation. However, these models can be exploited by malicious individuals who craft toxic prompts to elicit harmful or unethical responses. These individuals often employ jailbreaking techniques to bypass safety mechani… ▽ More Large language models (LLMs) like ChatGPT and Gemini have significantly advanced natural language processing, enabling various applications such as chatbots and automated content generation. However, these models can be exploited by malicious individuals who craft toxic prompts to elicit harmful or unethical responses. These individuals often employ jailbreaking techniques to bypass safety mechanisms, highlighting the need for robust toxic prompt detection methods. Existing detection techniques, both blackbox and whitebox, face challenges related to the diversity of toxic prompts, scalability, and computational efficiency. In response, we propose ToxicDetector, a lightweight greybox method designed to efficiently detect toxic prompts in LLMs. ToxicDetector leverages LLMs to create toxic concept prompts, uses embedding vectors to form feature vectors, and employs a Multi-Layer Perceptron (MLP) classifier for prompt classification. Our evaluation on various versions of the LLama models, Gemma-2, and multiple datasets demonstrates that ToxicDetector achieves a high accuracy of 96.39\% and a low false positive rate of 2.00\%, outperforming state-of-the-art methods. Additionally, ToxicDetector's processing time of 0.0780 seconds per prompt makes it highly suitable for real-time applications. ToxicDetector achieves high accuracy, efficiency, and scalability, making it a practical method for toxic prompt detection in LLMs. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: Accepted by the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)

arXiv:2408.11599 [pdf, other]

Cause-Aware Empathetic Response Generation via Chain-of-Thought Fine-Tuning

Authors: Xinhao Chen, Chong Yang, Man Lan, Li Cai, Yang Chen, Tu Hu, Xinlin Zhuang, Aimin Zhou

Abstract: Empathetic response generation endows agents with the capability to comprehend dialogue contexts and react to expressed emotions. Previous works predominantly focus on leveraging the speaker's emotional labels, but ignore the importance of emotion cause reasoning in empathetic response generation, which hinders the model's capacity for further affective understanding and cognitive inference. In th… ▽ More Empathetic response generation endows agents with the capability to comprehend dialogue contexts and react to expressed emotions. Previous works predominantly focus on leveraging the speaker's emotional labels, but ignore the importance of emotion cause reasoning in empathetic response generation, which hinders the model's capacity for further affective understanding and cognitive inference. In this paper, we propose a cause-aware empathetic generation approach by integrating emotions and causes through a well-designed Chain-of-Thought (CoT) prompt on Large Language Models (LLMs). Our approach can greatly promote LLMs' performance of empathy by instruction tuning and enhancing the role awareness of an empathetic listener in the prompt. Additionally, we propose to incorporate cause-oriented external knowledge from COMET into the prompt, which improves the diversity of generation and alleviates conflicts between internal and external knowledge at the same time. Experimental results on the benchmark dataset demonstrate that our approach on LLaMA-7b achieves state-of-the-art performance in both automatic and human evaluations. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11568 [pdf, ps, other]

Ergodicity for Ginzburg-Landau equation with complex-valued space-time white noise on two-dimensional torus

Authors: Huiping Chen, Yong Chen, Yong Liu

Abstract: We investigate the ergodicity for the stochastic complex Ginzburg-Landau equation with a general non-linear term on the two-dimensional torus driven by a complex-valued space-time white noise. Due to the roughness of complex-valued space-time white noise, this equation is a singular stochastic partial differential equation and its solution is expected to be a distribution-valued stochastic process… ▽ More We investigate the ergodicity for the stochastic complex Ginzburg-Landau equation with a general non-linear term on the two-dimensional torus driven by a complex-valued space-time white noise. Due to the roughness of complex-valued space-time white noise, this equation is a singular stochastic partial differential equation and its solution is expected to be a distribution-valued stochastic process. For this reason, the non-linear term is ill-defined and needs to be renormalized. We first use the theory of complex multiple Wiener-Ito integral to renormalize this equation and then consider its global well-posedness. Further, we prove its ergodicity using an asymptotic coupling argument for a large dissipation coefficient. △ Less

Submitted 21 August, 2024; originally announced August 2024.

MSC Class: 60H17; 37A25

arXiv:2408.11539 [pdf]

Research on the Application of Large Language Models in Automatic Question Generation: A Case Study of ChatGLM in the Context of High School Information Technology Curriculum

Authors: Yanxin Chen, Ling He

Abstract: This study investigates the application effectiveness of the Large Language Model (LLMs) ChatGLM in the automated generation of high school information technology exam questions. Through meticulously designed prompt engineering strategies, the model is guided to generate diverse questions, which are then comprehensively evaluated by domain experts. The evaluation dimensions include the Hitting(the… ▽ More This study investigates the application effectiveness of the Large Language Model (LLMs) ChatGLM in the automated generation of high school information technology exam questions. Through meticulously designed prompt engineering strategies, the model is guided to generate diverse questions, which are then comprehensively evaluated by domain experts. The evaluation dimensions include the Hitting(the degree of alignment with teaching content), Fitting (the degree of embodiment of core competencies), Clarity (the explicitness of question descriptions), and Willing to use (the teacher's willingness to use the question in teaching). The results indicate that ChatGLM outperforms human-generated questions in terms of clarity and teachers' willingness to use, although there is no significant difference in hit rate and fit. This finding suggests that ChatGLM has the potential to enhance the efficiency of question generation and alleviate the burden on teachers, providing a new perspective for the future development of educational assessment systems. Future research could explore further optimizations to the ChatGLM model to maintain high fit and hit rates while improving the clarity of questions and teachers' willingness to use them. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11405 [pdf, other]

DDSP Guitar Amp: Interpretable Guitar Amplifier Modeling

Authors: Yen-Tung Yeh, Yu-Hua Chen, Yuan-Chiao Cheng, Jui-Te Wu, Jun-Jie Fu, Yi-Fan Yeh, Yi-Hsuan Yang

Abstract: Neural network models for guitar amplifier emulation, while being effective, often demand high computational cost and lack interpretability. Drawing ideas from physical amplifier design, this paper aims to address these issues with a new differentiable digital signal processing (DDSP)-based model, called ``DDSP guitar amp,'' that models the four components of a guitar amp (i.e., preamp, tone stack… ▽ More Neural network models for guitar amplifier emulation, while being effective, often demand high computational cost and lack interpretability. Drawing ideas from physical amplifier design, this paper aims to address these issues with a new differentiable digital signal processing (DDSP)-based model, called ``DDSP guitar amp,'' that models the four components of a guitar amp (i.e., preamp, tone stack, power amp, and output transformer) using specific DSP-inspired designs. With a set of time- and frequency-domain metrics, we demonstrate that DDSP guitar amp achieves performance comparable with that of black-box baselines while requiring less than 10\% of the computational operations per audio sample, thereby holding greater potential for usages in real-time applications. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: Preprint paper

arXiv:2408.11334 [pdf, other]

BURExtract-Llama: An LLM for Clinical Concept Extraction in Breast Ultrasound Reports

Authors: Yuxuan Chen, Haoyan Yang, Hengkai Pan, Fardeen Siddiqui, Antonio Verdone, Qingyang Zhang, Sumit Chopra, Chen Zhao, Yiqiu Shen

Abstract: Breast ultrasound is essential for detecting and diagnosing abnormalities, with radiology reports summarizing key findings like lesion characteristics and malignancy assessments. Extracting this critical information is challenging due to the unstructured nature of these reports, with varied linguistic styles and inconsistent formatting. While proprietary LLMs like GPT-4 are effective, they are cos… ▽ More Breast ultrasound is essential for detecting and diagnosing abnormalities, with radiology reports summarizing key findings like lesion characteristics and malignancy assessments. Extracting this critical information is challenging due to the unstructured nature of these reports, with varied linguistic styles and inconsistent formatting. While proprietary LLMs like GPT-4 are effective, they are costly and raise privacy concerns when handling protected health information. This study presents a pipeline for developing an in-house LLM to extract clinical information from radiology reports. We first use GPT-4 to create a small labeled dataset, then fine-tune a Llama3-8B model on it. Evaluated on clinician-annotated reports, our model achieves an average F1 score of 84.6%, which is on par with GPT-4. Our findings demonstrate the feasibility of developing an in-house LLM that not only matches GPT-4's performance but also offers cost reductions and enhanced data privacy. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: This paper has been accepted as the oral paper for the HCHM workshop, ACM Multimedia 2024

arXiv:2408.11269 [pdf, other]

Real-time Hosting Capacity Assessment for Electric Vehicles: A Sequential Forecast-then-Optimize Method

Authors: Yingrui Zhuang, Lin Cheng, Ning Qi, Xinyi Wang, Yue Chen

Abstract: Hosting capacity (HC) assessment for electric vehicles (EVs) is of great significance to ensure the safe integration of EVs and the reliable operation of power systems. Existing HC assessment methods mainly focus on a long-term perspective (e.g., system planning), and model the EV charging demands as scalar values. However, HC estimated from a long-term perspective may be inaccurate and unreliable… ▽ More Hosting capacity (HC) assessment for electric vehicles (EVs) is of great significance to ensure the safe integration of EVs and the reliable operation of power systems. Existing HC assessment methods mainly focus on a long-term perspective (e.g., system planning), and model the EV charging demands as scalar values. However, HC estimated from a long-term perspective may be inaccurate and unreliable for real-time operation, since long-term peak estimation differs from the real-time stochasticity of EV charging demands. In this regard, this paper proposes a real-time HC assessment method for EVs through a three-step process of real-time probabilistic forecasting, real-time risk analysis and probabilistic optimization. Specifically, we first conduct real-time probabilistic forecasting through an adaptive spatio-temporal graph convolutional network to describe the stochasticity in EV charging demands across multiple charging stations. This model leverages adaptive spatial feature extraction, attention-based temporal feature extraction, and second-order graph representation to capture the spatio-temporal features of EV charging demands. Subsequently, based on the probabilistic forecasting, we propose a real-time risk analysis method, which is achieved by calculating the probabilistic power flow based on an improved Gaussian mixture model. Furthermore, we provide a real-time formulation of the HC of EVs and propose an optimization model to assess it. Numerical experiments on a real-world dataset demonstrate that the proposed forecasting model outperforms state-of-the-art forecasting methods by achieving the lowest RMSE value of 0.0442, the proposed real-time risk analysis method maintains over 96.6% accuracy with 99.99% reduction in computational complexity, and the real-time HC is improved by 66.3% compared to long-term assessment. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: This is a manuscript submitted to Applied Energy

arXiv:2408.11261 [pdf, other]

Towards Analyzing and Mitigating Sycophancy in Large Vision-Language Models

Authors: Yunpu Zhao, Rui Zhang, Junbin Xiao, Changxin Ke, Ruibo Hou, Yifan Hao, Qi Guo, Yunji Chen

Abstract: Large Vision-Language Models (LVLMs) have shown significant capability in vision-language understanding. However, one critical issue that persists in these models is sycophancy, which means models are unduly influenced by leading or deceptive prompts, resulting in biased outputs and hallucinations. Despite the progress in LVLMs, evaluating and mitigating sycophancy is yet much under-explored. In t… ▽ More Large Vision-Language Models (LVLMs) have shown significant capability in vision-language understanding. However, one critical issue that persists in these models is sycophancy, which means models are unduly influenced by leading or deceptive prompts, resulting in biased outputs and hallucinations. Despite the progress in LVLMs, evaluating and mitigating sycophancy is yet much under-explored. In this work, we fill this gap by systematically analyzing sycophancy on various VL benchmarks with curated leading queries and further proposing a text contrastive decoding method for mitigation. While the specific sycophantic behavior varies significantly among models, our analysis reveals the severe deficiency of all LVLMs in resilience of sycophancy across various tasks. For improvement, we propose Leading Query Contrastive Decoding (LQCD), a model-agnostic method focusing on calibrating the LVLMs' over-reliance on leading cues by identifying and suppressing the probabilities of sycophancy tokens at the decoding stage. Extensive experiments show that LQCD effectively mitigate sycophancy, outperforming both prompt engineering methods and common methods for hallucination mitigation. We further demonstrate that LQCD does not hurt but even slightly improves LVLMs' responses to neutral queries, suggesting it being a more effective strategy for general-purpose decoding but not limited to sycophancy. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.11044 [pdf, other]

High-Throughput Search for Photostrictive Materials based on a Thermodynamic Descriptor

Authors: Zeyu Xiang, Yubi Chen, Yujie Quan, Bolin Liao

Abstract: Photostriction is a phenomenon that can potentially improve the precision of light-driven actuation, the sensitivity of photodetection, and the efficiency of optical energy harvesting. However, known materials with significant photostriction are limited, and effective guidelines to discover new photostrictive materials are lacking. In this study, we perform a high-throughput computational search f… ▽ More Photostriction is a phenomenon that can potentially improve the precision of light-driven actuation, the sensitivity of photodetection, and the efficiency of optical energy harvesting. However, known materials with significant photostriction are limited, and effective guidelines to discover new photostrictive materials are lacking. In this study, we perform a high-throughput computational search for new photostrictive materials based on simple thermodynamic descriptors, namely the band gap pressure and stress coefficients. Using constrained density functional theory simulations, we establish that these descriptors can accurately predict intrinsic photostriction in a wide range of materials. Subsequently, we screen over 4770 stable semiconductors with a band gap below 2 eV from the Materials Project database to search for strongly photostrictive materials. This search identifies PtS$_2$ and Te$_2$I as the most promising ones, with photostriction exceeding 10$^{-4}$ with a moderate photocarrier concentration of 10$^{18}$ cm$^{-3}$. Furthermore, we provide a detailed analysis of factors contributing to strong photostriction, including bulk moduli and band-edge orbital interactions. Our results provide physical insights into photostriction of materials and demonstrate the effectiveness of using simple descriptors in high-throughput searches for new functional materials. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Showing 101–150 of 14,219 results for author: Chen, Y