Search | arXiv e-print repository

A study on the $F$-distribution motivated by Chvátal's theorem

Authors: Qianqian Zhou, Peng Lu, Zechun Hu

Abstract: Let $X_{d_1, d_2}$ be an $F$-random variable with parameters $d_1$ and $d_2,$ and expectation $E[X_{d_1, d_2}]$. In this paper, for any $κ>0,$ we investigate the infimum value of the probability $P(X_{d_1, d_2}\leq κE[X_{d_1, d_2}])$. Our motivation comes from Chvátal's theorem on the binomial distribution. Let $X_{d_1, d_2}$ be an $F$-random variable with parameters $d_1$ and $d_2,$ and expectation $E[X_{d_1, d_2}]$. In this paper, for any $κ>0,$ we investigate the infimum value of the probability $P(X_{d_1, d_2}\leq κE[X_{d_1, d_2}])$. Our motivation comes from Chvátal's theorem on the binomial distribution. △ Less

Submitted 14 September, 2024; originally announced September 2024.

arXiv:2409.09300 [pdf, other]

ManiDext: Hand-Object Manipulation Synthesis via Continuous Correspondence Embeddings and Residual-Guided Diffusion

Authors: Jiajun Zhang, Yuxiang Zhang, Liang An, Mengcheng Li, Hongwen Zhang, Zonghai Hu, Yebin Liu

Abstract: Dynamic and dexterous manipulation of objects presents a complex challenge, requiring the synchronization of hand motions with the trajectories of objects to achieve seamless and physically plausible interactions. In this work, we introduce ManiDext, a unified hierarchical diffusion-based framework for generating hand manipulation and grasp poses based on 3D object trajectories. Our key insight is… ▽ More Dynamic and dexterous manipulation of objects presents a complex challenge, requiring the synchronization of hand motions with the trajectories of objects to achieve seamless and physically plausible interactions. In this work, we introduce ManiDext, a unified hierarchical diffusion-based framework for generating hand manipulation and grasp poses based on 3D object trajectories. Our key insight is that accurately modeling the contact correspondences between objects and hands during interactions is crucial. Therefore, we propose a continuous correspondence embedding representation that specifies detailed hand correspondences at the vertex level between the object and the hand. This embedding is optimized directly on the hand mesh in a self-supervised manner, with the distance between embeddings reflecting the geodesic distance. Our framework first generates contact maps and correspondence embeddings on the object's surface. Based on these fine-grained correspondences, we introduce a novel approach that integrates the iterative refinement process into the diffusion process during the second stage of hand pose generation. At each step of the denoising process, we incorporate the current hand pose residual as a refinement target into the network, guiding the network to correct inaccurate hand poses. Introducing residuals into each denoising step inherently aligns with traditional optimization process, effectively merging generation and refinement into a single unified framework. Extensive experiments demonstrate that our approach can generate physically plausible and highly realistic motions for various tasks, including single and bimanual hand grasping as well as manipulating both rigid and articulated objects. Code will be available for research purposes. △ Less

Submitted 14 September, 2024; originally announced September 2024.

arXiv:2409.09292 [pdf, other]

StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads

Authors: Suzhen Wang, Yifeng Ma, Yu Ding, Zhipeng Hu, Changjie Fan, Tangjie Lv, Zhidong Deng, Xin Yu

Abstract: Individuals have unique facial expression and head pose styles that reflect their personalized speaking styles. Existing one-shot talking head methods cannot capture such personalized characteristics and therefore fail to produce diverse speaking styles in the final videos. To address this challenge, we propose a one-shot style-controllable talking face generation method that can obtain speaking s… ▽ More Individuals have unique facial expression and head pose styles that reflect their personalized speaking styles. Existing one-shot talking head methods cannot capture such personalized characteristics and therefore fail to produce diverse speaking styles in the final videos. To address this challenge, we propose a one-shot style-controllable talking face generation method that can obtain speaking styles from reference speaking videos and drive the one-shot portrait to speak with the reference speaking styles and another piece of audio. Our method aims to synthesize the style-controllable coefficients of a 3D Morphable Model (3DMM), including facial expressions and head movements, in a unified framework. Specifically, the proposed framework first leverages a style encoder to extract the desired speaking styles from the reference videos and transform them into style codes. Then, the framework uses a style-aware decoder to synthesize the coefficients of 3DMM from the audio input and style codes. During decoding, our framework adopts a two-branch architecture, which generates the stylized facial expression coefficients and stylized head movement coefficients, respectively. After obtaining the coefficients of 3DMM, an image renderer renders the expression coefficients into a specific person's talking-head video. Extensive experiments demonstrate that our method generates visually authentic talking head videos with diverse speaking styles from only one portrait image and an audio clip. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: TPAMI 2024. arXiv admin note: text overlap with arXiv:2301.01081

arXiv:2409.07911 [pdf, other]

Tera-SpaceCom: GNN-based Deep Reinforcement Learning for Joint Resource Allocation and Task Offloading in TeraHertz Band Space Networks

Authors: Zhifeng Hu, Chong Han, Wolfgang Gerstacker, Ian F. Akyildiz

Abstract: Terahertz (THz) space communications (Tera-SpaceCom) is envisioned as a promising technology to enable various space science and communication applications. Mainly, the realm of Tera-SpaceCom consists of THz sensing for space exploration, data centers in space providing cloud services for space exploration tasks, and a low earth orbit (LEO) mega-constellation relaying these tasks to ground station… ▽ More Terahertz (THz) space communications (Tera-SpaceCom) is envisioned as a promising technology to enable various space science and communication applications. Mainly, the realm of Tera-SpaceCom consists of THz sensing for space exploration, data centers in space providing cloud services for space exploration tasks, and a low earth orbit (LEO) mega-constellation relaying these tasks to ground stations (GSs) or data centers via THz links. Moreover, to reduce the computational burden on data centers as well as resource consumption and latency in the relaying process, the LEO mega-constellation provides satellite edge computing (SEC) services to directly compute space exploration tasks without relaying these tasks to data centers. The LEO satellites that receive space exploration tasks offload (i.e., distribute) partial tasks to their neighboring LEO satellites, to further reduce their computational burden. However, efficient joint communication resource allocation and computing task offloading for the Tera-SpaceCom SEC network is an NP-hard mixed-integer nonlinear programming problem (MINLP), due to the discrete nature of space exploration tasks and sub-arrays as well as the continuous nature of transmit power. To tackle this challenge, a graph neural network (GNN)-deep reinforcement learning (DRL)-based joint resource allocation and task offloading (GRANT) algorithm is proposed with the target of long-term resource efficiency (RE). Particularly, GNNs learn relationships among different satellites from their connectivity information. Furthermore, multi-agent and multi-task mechanisms cooperatively train task offloading and resource allocation. Compared with benchmark solutions, GRANT not only achieves the highest RE with relatively low latency, but realizes the fewest trainable parameters and the shortest running time. △ Less

Submitted 12 September, 2024; originally announced September 2024.

arXiv:2409.07068 [pdf, other]

doi 10.1002/qute.202400094

Fully-Optimized Quantum Metrology: Framework, Tools, and Applications

Authors: Qiushi Liu, Zihao Hu, Haidong Yuan, Yuxiang Yang

Abstract: This tutorial introduces a systematic approach for addressing the key question of quantum metrology: For a generic task of sensing an unknown parameter, what is the ultimate precision given a constrained set of admissible strategies. The approach outputs the maximal attainable precision (in terms of the maximum of quantum Fisher information) as a semidefinite program and optimal strategies as feas… ▽ More This tutorial introduces a systematic approach for addressing the key question of quantum metrology: For a generic task of sensing an unknown parameter, what is the ultimate precision given a constrained set of admissible strategies. The approach outputs the maximal attainable precision (in terms of the maximum of quantum Fisher information) as a semidefinite program and optimal strategies as feasible solutions thereof. Remarkably, the approach can identify the optimal precision for different sets of strategies, including parallel, sequential, quantum SWITCH-enhanced, causally superposed, and generic indefinite-causal-order strategies. The tutorial consists of a pedagogic introduction to the background and mathematical tools of optimal quantum metrology, a detailed derivation of the main approach, and various concrete examples. As shown in the tutorial, applications of the approach include, but are not limited to, strict hierarchy of strategies in noisy quantum metrology, memory effect in non-Markovian metrology, and designing optimal strategies. Compared with traditional approaches, the approach here yields the exact value of the optimal precision, offering more accurate criteria for experiments and practical applications. It also allows for the comparison between conventional strategies and the recently discovered causally-indefinite strategies, serving as a powerful tool for exploring this new area of quantum metrology. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: Tutorial. 38 pages, 18 figures

Journal ref: Adv. Quantum Technol. 2024, 2400094 (2024)

arXiv:2409.06662 [pdf, other]

doi 10.1145/3680528.3687565

World-Grounded Human Motion Recovery via Gravity-View Coordinates

Authors: Zehong Shen, Huaijin Pi, Yan Xia, Zhi Cen, Sida Peng, Zechen Hu, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

Abstract: We present a novel method for recovering world-grounded human motion from monocular video. The main challenge lies in the ambiguity of defining the world coordinate system, which varies between sequences. Previous approaches attempt to alleviate this issue by predicting relative motion in an autoregressive manner, but are prone to accumulating errors. Instead, we propose estimating human poses in… ▽ More We present a novel method for recovering world-grounded human motion from monocular video. The main challenge lies in the ambiguity of defining the world coordinate system, which varies between sequences. Previous approaches attempt to alleviate this issue by predicting relative motion in an autoregressive manner, but are prone to accumulating errors. Instead, we propose estimating human poses in a novel Gravity-View (GV) coordinate system, which is defined by the world gravity and the camera view direction. The proposed GV system is naturally gravity-aligned and uniquely defined for each video frame, largely reducing the ambiguity of learning image-pose mapping. The estimated poses can be transformed back to the world coordinate system using camera rotations, forming a global motion sequence. Additionally, the per-frame estimation avoids error accumulation in the autoregressive methods. Experiments on in-the-wild benchmarks demonstrate that our method recovers more realistic motion in both the camera space and world-grounded settings, outperforming state-of-the-art methods in both accuracy and speed. The code is available at https://zju3dv.github.io/gvhmr/. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: Accepted at SIGGRAPH Asia 2024 (Conference Track). Project page: https://zju3dv.github.io/gvhmr/

arXiv:2409.06155 [pdf, other]

Exciton crystal melting and destruction by disorder in bilayer quantum hall system with total filling factor one

Authors: Zhengfei Hu, Kun Yang

Abstract: Bilayer quantum hall system with total filling factor 1 was studied in the regime of heavy layer imbalance in a recent transport experiment (Ref. 1), with intriguing new findings. We demonstrate in this paper that 1) the exciton Wigner crystal in this regime can melt into a superfluid phase, giving rise to re-entrant superfluid behavior; 2) in the presence of disorder, electron and hole Wigner cry… ▽ More Bilayer quantum hall system with total filling factor 1 was studied in the regime of heavy layer imbalance in a recent transport experiment (Ref. 1), with intriguing new findings. We demonstrate in this paper that 1) the exciton Wigner crystal in this regime can melt into a superfluid phase, giving rise to re-entrant superfluid behavior; 2) in the presence of disorder, electron and hole Wigner crystals in the two layers go through a locking/decoupling transition as layer separation increases, resulting in a sudden change in the counter flow conductance. Comparison will be made with the findings of Ref. 1. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: 12 pages, 3 figures

arXiv:2409.06154 [pdf, other]

UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos

Authors: Yin Chen, Jia Li, Yu Zhang, Zhenzhen Hu, Shiguang Shan, Meng Wang, Richang Hong

Abstract: Dynamic facial expression recognition (DFER) is essential for understanding human emotions and behavior. However, conventional DFER methods, which primarily use dynamic facial data, often underutilize static expression images and their labels, limiting their performance and robustness. To overcome this, we introduce UniLearn, a novel unified learning paradigm that integrates static facial expressi… ▽ More Dynamic facial expression recognition (DFER) is essential for understanding human emotions and behavior. However, conventional DFER methods, which primarily use dynamic facial data, often underutilize static expression images and their labels, limiting their performance and robustness. To overcome this, we introduce UniLearn, a novel unified learning paradigm that integrates static facial expression recognition (SFER) data to enhance DFER task. UniLearn employs a dual-modal self-supervised pre-training method, leveraging both facial expression images and videos to enhance a ViT model's spatiotemporal representation capability. Then, the pre-trained model is fine-tuned on both static and dynamic expression datasets using a joint fine-tuning strategy. To prevent negative transfer during joint fine-tuning, we introduce an innovative Mixture of Adapter Experts (MoAE) module that enables task-specific knowledge acquisition and effectively integrates information from both static and dynamic expression data. Extensive experiments demonstrate UniLearn's effectiveness in leveraging complementary information from static and dynamic facial data, leading to more accurate and robust DFER. UniLearn consistently achieves state-of-the-art performance on FERV39K, MAFW, and DFEW benchmarks, with weighted average recall (WAR) of 53.65\%, 58.44\%, and 76.68\%, respectively. The source code and model weights will be publicly available at \url{https://github.com/MSA-LMC/UniLearn}. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.05865 [pdf, other]

Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

Authors: Haritheja Etukuru, Norihito Naka, Zijin Hu, Seungjae Lee, Julian Mehu, Aaron Edsinger, Chris Paxton, Soumith Chintala, Lerrel Pinto, Nur Muhammad Mahi Shafiullah

Abstract: Robot models, particularly those trained with large amounts of data, have recently shown a plethora of real-world manipulation and navigation capabilities. Several independent efforts have shown that given sufficient training data in an environment, robot policies can generalize to demonstrated variations in that environment. However, needing to finetune robot models to every new environment stand… ▽ More Robot models, particularly those trained with large amounts of data, have recently shown a plethora of real-world manipulation and navigation capabilities. Several independent efforts have shown that given sufficient training data in an environment, robot policies can generalize to demonstrated variations in that environment. However, needing to finetune robot models to every new environment stands in stark contrast to models in language or vision that can be deployed zero-shot for open-world problems. In this work, we present Robot Utility Models (RUMs), a framework for training and deploying zero-shot robot policies that can directly generalize to new environments without any finetuning. To create RUMs efficiently, we develop new tools to quickly collect data for mobile manipulation tasks, integrate such data into a policy with multi-modal imitation learning, and deploy policies on-device on Hello Robot Stretch, a cheap commodity robot, with an external mLLM verifier for retrying. We train five such utility models for opening cabinet doors, opening drawers, picking up napkins, picking up paper bags, and reorienting fallen objects. Our system, on average, achieves 90% success rate in unseen, novel environments interacting with unseen objects. Moreover, the utility models can also succeed in different robot and camera set-ups with no further data, training, or fine-tuning. Primary among our lessons are the importance of training data over training algorithm and policy class, guidance about data scaling, necessity for diverse yet high-quality demonstrations, and a recipe for robot introspection and retrying to improve performance on individual environments. Our code, data, models, hardware designs, as well as our experiment and deployment videos are open sourced and can be found on our project website: https://robotutilitymodels.com △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: Project website https://robotutilitymodels.com

arXiv:2409.05785 [pdf, other]

NeurLZ: On Enhancing Lossy Compression Performance based on Error-Controlled Neural Learning for Scientific Data

Authors: Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin

Abstract: Large-scale scientific simulations generate massive datasets that pose significant challenges for storage and I/O. While traditional lossy compression techniques can improve performance, balancing compression ratio, data quality, and throughput remains difficult. To address this, we propose NeurLZ, a novel cross-field learning-based and error-controlled compression framework for scientific data. B… ▽ More Large-scale scientific simulations generate massive datasets that pose significant challenges for storage and I/O. While traditional lossy compression techniques can improve performance, balancing compression ratio, data quality, and throughput remains difficult. To address this, we propose NeurLZ, a novel cross-field learning-based and error-controlled compression framework for scientific data. By integrating skipping DNN models, cross-field learning, and error control, our framework aims to substantially enhance lossy compression performance. Our contributions are three-fold: (1) We design a lightweight skipping model to provide high-fidelity detail retention, further improving prediction accuracy. (2) We adopt a cross-field learning approach to significantly improve data prediction accuracy, resulting in a substantially improved compression ratio. (3) We develop an error control approach to provide strict error bounds according to user requirements. We evaluated NeurLZ on several real-world HPC application datasets, including Nyx (cosmological simulation), Miranda (large turbulence simulation), and Hurricane (weather simulation). Experiments demonstrate that our framework achieves up to a 90% relative reduction in bit rate under the same data distortion, compared to the best existing approach. △ Less

Submitted 9 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.05734 [pdf, other]

Structured Random Model for Fast and Robust Phase Retrieval

Authors: Zhiyuan Hu, Julián Tachella, Michael Unser, Jonathan Dong

Abstract: Phase retrieval, a nonlinear problem prevalent in imaging applications, has been extensively studied using random models, some of which with i.i.d. sensing matrix components. While these models offer robust reconstruction guarantees, they are computationally expensive and impractical for real-world scenarios. In contrast, Fourier-based models, common in applications such as ptychography and coded… ▽ More Phase retrieval, a nonlinear problem prevalent in imaging applications, has been extensively studied using random models, some of which with i.i.d. sensing matrix components. While these models offer robust reconstruction guarantees, they are computationally expensive and impractical for real-world scenarios. In contrast, Fourier-based models, common in applications such as ptychography and coded diffraction imaging, are computationally more efficient but lack the theoretical guarantees of random models. Here, we introduce structured random models for phase retrieval that combine the efficiency of fast Fourier transforms with the versatility of random diagonal matrices. These models emulate i.i.d. random matrices at a fraction of the computational cost. Our approach demonstrates robust reconstructions comparable to fully random models using gradient descent and spectral methods. Furthermore, we establish that a minimum of two structured layers is necessary to achieve these structured-random properties. The proposed method is suitable for optical implementation and offers an efficient and robust alternative for phase retrieval in practical imaging applications. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.05552 [pdf, other]

Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

Authors: Xuesong Zhang, Jia Li, Yunbo Xu, Zhenzhen Hu, Richang Hong

Abstract: Autonomous navigation for an embodied agent guided by natural language instructions remains a formidable challenge in vision-and-language navigation (VLN). Despite remarkable recent progress in learning fine-grained and multifarious visual representations, the tendency to overfit to the training environments leads to unsatisfactory generalization performance. In this work, we present a versatile M… ▽ More Autonomous navigation for an embodied agent guided by natural language instructions remains a formidable challenge in vision-and-language navigation (VLN). Despite remarkable recent progress in learning fine-grained and multifarious visual representations, the tendency to overfit to the training environments leads to unsatisfactory generalization performance. In this work, we present a versatile Multi-Branch Architecture (MBA) aimed at exploring and exploiting diverse visual inputs. Specifically, we introduce three distinct visual variants: ground-truth depth images, visual inputs integrated with incongruent views, and those infused with random noise to enrich the diversity of visual input representation and prevent overfitting to the original RGB observations. To adaptively fuse these varied inputs, the proposed MBA extend a base agent model into a multi-branch variant, where each branch processes a different visual input. Surprisingly, even random noise can further enhance navigation performance in unseen environments. Extensive experiments conducted on three VLN benchmarks (R2R, REVERIE, SOON) demonstrate that our proposed method equals or even surpasses state-of-the-art results. The source code will be publicly available. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: 5 pages, 2 figures, submitted to ICASSP 2025

arXiv:2409.05073 [pdf, ps, other]

Parahoric reduction theory of formal connections (or Higgs fields)

Authors: Zhi Hu, Pengfei Huang, Ruiran Sun, Runhong Zong

Abstract: In this paper, we establish the parahoric reduction theory of formal connections (or Higgs fields) on a formal principal bundle with parahoric structures, which generalizes Babbitt-Varadarajan's result for the case without parahoric structures [5] and Boalch's result for the case of regular singularity [9]. As applications, we prove the equivalence between extrinsic definition and intrinsic defini… ▽ More In this paper, we establish the parahoric reduction theory of formal connections (or Higgs fields) on a formal principal bundle with parahoric structures, which generalizes Babbitt-Varadarajan's result for the case without parahoric structures [5] and Boalch's result for the case of regular singularity [9]. As applications, we prove the equivalence between extrinsic definition and intrinsic definition of regular singularity and provide a criterion of relative regularity for formal connections, and also demonstrate a parahoric version of Frenkel-Zhu's Borel reduction theorem of formal connections [23]. △ Less

Submitted 8 September, 2024; originally announced September 2024.

Comments: 24 pages, comments are welcome!

arXiv:2409.04805 [pdf, other]

Neutron stars in the bumblebee theory of gravity

Authors: Peixiang Ji, Zhuhai Li, Lirui Yang, Rui Xu, Zexin Hu, Lijing Shao

Abstract: Recently, theoretical studies on the bumblebee gravity model, a nonminimally-coupled vector-tensor theory that violates the Lorentz symmetry, have flourished, with a simultaneous increase in the utilization of observations to impose constraints. The static spherical solutions of neutron stars (NSs) in the bumblebee theory are calculated comprehensively in this work. These solutions with different… ▽ More Recently, theoretical studies on the bumblebee gravity model, a nonminimally-coupled vector-tensor theory that violates the Lorentz symmetry, have flourished, with a simultaneous increase in the utilization of observations to impose constraints. The static spherical solutions of neutron stars (NSs) in the bumblebee theory are calculated comprehensively in this work. These solutions with different coupling constants reveal a rich theoretical landscape for NSs, including vectorized NSs and NSs with finite radii but divergent masses. With these solutions, preliminary constraints on the asymptotic vector field values are obtained through restrictions on the stellar radius. △ Less

Submitted 7 September, 2024; originally announced September 2024.

Comments: 15 pages, 9 figures

arXiv:2409.03605 [pdf, other]

SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing

Authors: Lingyu Xiong, Xize Cheng, Jintao Tan, Xianjia Wu, Xiandong Li, Lei Zhu, Fei Ma, Minglei Li, Huang Xu, Zhihu Hu

Abstract: Audio-driven talking face generation aims to synthesize video with lip movements synchronized to input audio. However, current generative techniques face challenges in preserving intricate regional textures (skin, teeth). To address the aforementioned challenges, we propose a novel framework called SegTalker to decouple lip movements and image textures by introducing segmentation as intermediate r… ▽ More Audio-driven talking face generation aims to synthesize video with lip movements synchronized to input audio. However, current generative techniques face challenges in preserving intricate regional textures (skin, teeth). To address the aforementioned challenges, we propose a novel framework called SegTalker to decouple lip movements and image textures by introducing segmentation as intermediate representation. Specifically, given the mask of image employed by a parsing network, we first leverage the speech to drive the mask and generate talking segmentation. Then we disentangle semantic regions of image into style codes using a mask-guided encoder. Ultimately, we inject the previously generated talking segmentation and style codes into a mask-guided StyleGAN to synthesize video frame. In this way, most of textures are fully preserved. Moreover, our approach can inherently achieve background separation and facilitate mask-guided facial local editing. In particular, by editing the mask and swapping the region textures from a given reference image (e.g. hair, lip, eyebrows), our approach enables facial editing seamlessly when generating talking face video. Experiments demonstrate that our proposed approach can effectively preserve texture details and generate temporally consistent video while remaining competitive in lip synchronization. Quantitative and qualitative results on the HDTF and MEAD datasets illustrate the superior performance of our method over existing methods. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: 10 pages, 7 figures, 3 tables

arXiv:2409.03341 [pdf, other]

doi 10.1103/PhysRevApplied.21.054041

Direct Readout of Nitrogen-Vacancy Hybrid-Spin Quantum Register in Diamond by Photon Arrival Time Analysis

Authors: Jingyan He, Yu Tian, Zhiyi Hu, Runchuan Ye, Xiangyu Wang, Dawei Lu, Nanyang Xu

Abstract: Quantum state readout plays a pivotal role in quantum technologies, spanning applications in sensing, computation, and secure communication. In this work, we introduce a new approach for efficiently reading populations of hybrid-spin states in the nitrogen-vacancy center of diamond using a single laser pulse, which utilizes the excited state level anti-crossing mechanism at around 500 Gs. Reading… ▽ More Quantum state readout plays a pivotal role in quantum technologies, spanning applications in sensing, computation, and secure communication. In this work, we introduce a new approach for efficiently reading populations of hybrid-spin states in the nitrogen-vacancy center of diamond using a single laser pulse, which utilizes the excited state level anti-crossing mechanism at around 500 Gs. Reading spin state populations through this approach achieves the same outcome as traditional quantum state diagonal tomography but significantly reduces the experimental time by an order of magnitude while maintaining fidelity. Moreover, this approach may be extended to encompass full-state tomography, thereby obviating the requirement for a sequence of spin manipulations and mitigating errors induced by decoherence throughout the procedure. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.03339 [pdf, other]

doi 10.1021/acs.nanolett.3c04822

Four-order power reduction in nanoscale electron-nuclear double resonance with a nitrogen-vacancy center in diamond

Authors: Zhiyi Hu, Fengjian Jiang, Jingyan He, Yulin Dai, Ya Wang, Nanyang Xu, Jiangfeng Du

Abstract: Detecting nuclear spins using single Nitrogen-Vacancy (NV) centers is of particular importance in nano-scale science and engineering, but often suffers from the heating effect of microwave fields for spin manipulation, especially under high magnetic fields. Here, we realize an energy-efficient nano-scale nuclear-spin detection using a phase-modulation electron-nuclear double resonance scheme. The… ▽ More Detecting nuclear spins using single Nitrogen-Vacancy (NV) centers is of particular importance in nano-scale science and engineering, but often suffers from the heating effect of microwave fields for spin manipulation, especially under high magnetic fields. Here, we realize an energy-efficient nano-scale nuclear-spin detection using a phase-modulation electron-nuclear double resonance scheme. The microwave field can be reduced to 1/250 of previous requirements and the corresponding power is over four orders lower. Meanwhile, the microwave-induced broadening to the line-width of the spectroscopy is significantly canceled and we achieve a nuclear-spin spectrum with a resolution down to 2.1 kHz under a magnetic field at 1840 Gs. The spectral resolution can be further improved by upgrading the experimental control precision. This scheme can also be used in sensing microwave fields and extended to a wide range of applications in the future. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.03231 [pdf, other]

State-space models are accurate and efficient neural operators for dynamical systems

Authors: Zheyuan Hu, Nazanin Ahmadi Daryakenari, Qianli Shen, Kenji Kawaguchi, George Em Karniadakis

Abstract: Physics-informed machine learning (PIML) has emerged as a promising alternative to classical methods for predicting dynamical systems, offering faster and more generalizable solutions. However, existing models, including recurrent neural networks (RNNs), transformers, and neural operators, face challenges such as long-time integration, long-range dependencies, chaotic dynamics, and extrapolation,… ▽ More Physics-informed machine learning (PIML) has emerged as a promising alternative to classical methods for predicting dynamical systems, offering faster and more generalizable solutions. However, existing models, including recurrent neural networks (RNNs), transformers, and neural operators, face challenges such as long-time integration, long-range dependencies, chaotic dynamics, and extrapolation, to name a few. To this end, this paper introduces state-space models implemented in Mamba for accurate and efficient dynamical system operator learning. Mamba addresses the limitations of existing architectures by dynamically capturing long-range dependencies and enhancing computational efficiency through reparameterization techniques. To extensively test Mamba and compare against another 11 baselines, we introduce several strict extrapolation testbeds that go beyond the standard interpolation benchmarks. We demonstrate Mamba's superior performance in both interpolation and challenging extrapolation tasks. Mamba consistently ranks among the top models while maintaining the lowest computational cost and exceptional extrapolation capabilities. Moreover, we demonstrate the good performance of Mamba for a real-world application in quantitative systems pharmacology for assessing the efficacy of drugs in tumor growth under limited data scenarios. Taken together, our findings highlight Mamba's potential as a powerful tool for advancing scientific machine learning in dynamical systems modeling. (The code will be available at https://github.com/zheyuanhu01/State_Space_Model_Neural_Operator upon acceptance.) △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 34 pages

ACM Class: F.2.2; I.2.7

arXiv:2409.02577 [pdf]

Interlayer coupling rotatable magnetic easy-axis in MnSe2 mono- and bi-layers

Authors: Zhongqin Zhang, Cong Wang, PengJie Guo, Linwei Zhou, Yuhao Pan, Zhixin Hu, Wei Ji

Abstract: Interlayer coupling plays a critical role in tuning the electronic structures and magnetic ground states of two-dimensional materials, influenced by the number of layers, interlayer distance, and stacking order. However, its effect on the orientation of the magnetic easy axis remains underexplored. In this study, we demonstrate that interlayer coupling can significantly alter the magnetic easy-axi… ▽ More Interlayer coupling plays a critical role in tuning the electronic structures and magnetic ground states of two-dimensional materials, influenced by the number of layers, interlayer distance, and stacking order. However, its effect on the orientation of the magnetic easy axis remains underexplored. In this study, we demonstrate that interlayer coupling can significantly alter the magnetic easy-axis orientation, as shown by the magnetic easy-axis of monolayer 1T-MnSe2 tilting 33° from the z-axis, while aligning with the z-axis in the bilayer. This change results from variations in orbital occupations near the Fermi level, particularly involving nonmetallic Se atoms. Contrary to the traditional focus on magnetic metal atoms, our findings reveal that Se orbitals play a key role in influencing the easy-axis orientation and topological Chern numbers. Furthermore, we show that the occupation of Se p-orbitals, and consequently the magnetic anisotropy, can be modulated by factors such as stacking order, charge doping, and external strain. Our results highlight the pivotal role of interlayer coupling in tuning the magnetic properties of layered materials, with important implications for spintronic applications. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.02076 [pdf, other]

LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs

Authors: Yuhao Wu, Ming Shan Hee, Zhiqing Hu, Roy Ka-Wei Lee

Abstract: In evaluating the long-context capabilities of large language models (LLMs), benchmarks such as "Needle-in-a-Haystack" (NIAH), Ruler, and Needlebench are commonly used. While these benchmarks measure how well models understand long-context input sequences, they do not effectively gauge the quality of long-form text generation--a critical aspect for applications such as design proposals and creativ… ▽ More In evaluating the long-context capabilities of large language models (LLMs), benchmarks such as "Needle-in-a-Haystack" (NIAH), Ruler, and Needlebench are commonly used. While these benchmarks measure how well models understand long-context input sequences, they do not effectively gauge the quality of long-form text generation--a critical aspect for applications such as design proposals and creative writing. To address this gap, we have introduced a new long-form text evaluation benchmark, LongGenBench, which tests models' ability to identify specific events within generated long text sequences. In this benchmark, we prompt long-context LMs to create long-form text that must include particular events or constraints and evaluate their ability to incorporate these elements. We evaluated ten long-context LMs across four distinct scenarios, three types of prompt instructions, and two different generation-length settings (16K and 32K). Although these models perform well on NIAH benchmarks, none demonstrated satisfactory performance on the LongGenBench, raising concerns about their ability to generate coherent long-form text that follows instructions. Additionally, as the length of the generated text increases, all models exhibit a significant drop in performance. △ Less

Submitted 15 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

Comments: work in progress; Github: https://github.com/mozhu621/LongGenBench/

arXiv:2409.01557 [pdf, other]

TASL-Net: Tri-Attention Selective Learning Network for Intelligent Diagnosis of Bimodal Ultrasound Video

Authors: Chengqian Zhao, Zhao Yao, Zhaoyu Hu, Yuanxin Xie, Yafang Zhang, Yuanyuan Wang, Shuo Li, Jianhua Zhou, Jianqiao Zhou, Yin Wang, Jinhua Yu

Abstract: In the intelligent diagnosis of bimodal (gray-scale and contrast-enhanced) ultrasound videos, medical domain knowledge such as the way sonographers browse videos, the particular areas they emphasize, and the features they pay special attention to, plays a decisive role in facilitating precise diagnosis. Embedding medical knowledge into the deep learning network can not only enhance performance but… ▽ More In the intelligent diagnosis of bimodal (gray-scale and contrast-enhanced) ultrasound videos, medical domain knowledge such as the way sonographers browse videos, the particular areas they emphasize, and the features they pay special attention to, plays a decisive role in facilitating precise diagnosis. Embedding medical knowledge into the deep learning network can not only enhance performance but also boost clinical confidence and reliability of the network. However, it is an intractable challenge to automatically focus on these person- and disease-specific features in videos and to enable networks to encode bimodal information comprehensively and efficiently. This paper proposes a novel Tri-Attention Selective Learning Network (TASL-Net) to tackle this challenge and automatically embed three types of diagnostic attention of sonographers into a mutual transformer framework for intelligent diagnosis of bimodal ultrasound videos. Firstly, a time-intensity-curve-based video selector is designed to mimic the temporal attention of sonographers, thus removing a large amount of redundant information while improving computational efficiency of TASL-Net. Then, to introduce the spatial attention of the sonographers for contrast-enhanced video analysis, we propose the earliest-enhanced position detector based on structural similarity variation, on which the TASL-Net is made to focus on the differences of perfusion variation inside and outside the lesion. Finally, by proposing a mutual encoding strategy that combines convolution and transformer, TASL-Net possesses bimodal attention to structure features on gray-scale videos and to perfusion variations on contrast-enhanced videos. These modules work collaboratively and contribute to superior performance. We conduct a detailed experimental validation of TASL-Net's performance on three datasets, including lung, breast, and liver. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2409.01240 [pdf, other]

DiffEyeSyn: Diffusion-based User-specific Eye Movement Synthesis

Authors: Chuhan Jiao, Guanhua Zhang, Zhiming Hu, Andreas Bulling

Abstract: High-frequency components in eye gaze data contain user-specific information promising for various applications, but existing gaze modelling methods focus on low frequencies of typically not more than 30 Hz. We present DiffEyeSyn -- the first computational method to synthesise high-frequency gaze data, including eye movement characteristics specific to individual users. The key idea is to consider… ▽ More High-frequency components in eye gaze data contain user-specific information promising for various applications, but existing gaze modelling methods focus on low frequencies of typically not more than 30 Hz. We present DiffEyeSyn -- the first computational method to synthesise high-frequency gaze data, including eye movement characteristics specific to individual users. The key idea is to consider the high-frequency, user-specific information as a special type of noise in eye movement data. This perspective reshapes eye movement synthesis into the task of injecting this user-specific noise into any given eye movement sequence. We formulate this injection task as a conditional diffusion process in which the synthesis is conditioned on user-specific embeddings extracted from the gaze data using pre-trained models for user authentication. We propose user identity guidance -- a novel loss function that allows our model to preserve user identity while generating human-like eye movements in the spatial domain. Experiment results on two public high-frequency eye movement biometric datasets show that our synthetic eye movements are indistinguishable from real human eye movements. Furthermore, we demonstrate that DiffEyeSyn can be used to synthesise eye gaze data at scale and for different downstream tasks, such as gaze data imputation and gaze data super-resolution. As such, our work lays the methodological foundations for personalised eye movement synthesis that has significant application potential, such as for character animation, eye movement biometrics, or gaze-based activity and context recognition. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2409.00519 [pdf, ps, other]

Blow-up solutions for the steady state of the Keller-Segel system on Riemann surfaces

Authors: Zhengni Hu, Thomas Bartsch, Mohameden Ahmedou

Abstract: We study the following Neumann boundary problem related to the stationary solutions of the Keller-Segel system, a basic model of chemotaxis phenomena: \[ -Δ_g u +βu =λ\left(\frac{Ve^u}{\int_Σ Ve^u d v_g}-\frac{1}{|Σ|_g}\right) \text { in } \mathringΣ\] with $\partial_{ ν_g} u=0, \text { on } \partial Σ$, where $(Σ, g)$ is a compact Riemann surface with the interior $\mathringΣ$ and the smooth boun… ▽ More We study the following Neumann boundary problem related to the stationary solutions of the Keller-Segel system, a basic model of chemotaxis phenomena: \[ -Δ_g u +βu =λ\left(\frac{Ve^u}{\int_Σ Ve^u d v_g}-\frac{1}{|Σ|_g}\right) \text { in } \mathringΣ\] with $\partial_{ ν_g} u=0, \text { on } \partial Σ$, where $(Σ, g)$ is a compact Riemann surface with the interior $\mathringΣ$ and the smooth boundary $\partial Σ$. Here, $λ, β\geq 0$ are non-negative parameters, and $V$ is a smooth non-negative function with a finite zero set. For any given integers $m\geq k\geq 0$, we establish a sufficient condition on $V$ for the existence of a sequence of blow-up solutions as $λ$ approaches the critical values $4π(m+k)$. Moreover, the study expands to the corresponding singular problem. △ Less

Submitted 31 August, 2024; originally announced September 2024.

MSC Class: 35J57; 58J05

arXiv:2409.00509 [pdf, other]

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

Authors: Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi

Abstract: Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training s… ▽ More Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training strategy for extending the context window of LLMs, including impactful token analysis, position index transformation, and training optimization strategies. It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies. Experiments on three types of LLMs show that LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training. Furthermore, LongRecipe also preserves the original LLM's capabilities in general tasks. Ultimately, we can extend the effective context window of open-source LLMs from 8k to 128k, achieving performance close to GPT-4 with just one day of dedicated training using a single GPU with 80G memory. Our code is released at https://github.com/zhiyuanhubj/LongRecipe. △ Less

Submitted 4 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

Comments: Work in Progress

arXiv:2409.00402 [pdf, ps, other]

Generalized Orthogonal Chirp Division Multiplexing in Doubly Selective Channels

Authors: Yun Liu, Hao Zhao, Huazhen Yao, Zeng Hu, Yinming Cui, Dehuan Wan

Abstract: In recent years, orthogonal chirp division modulation (OCDM) has gained attention as a robust communication waveform due to its strong resistance to both time-domain and frequency-domain interference. However, similar to orthogonal frequency division multiplexing (OFDM), OCDM suffers from a high peak-to-average power ratio (PAPR), resulting in increased hardware costs and reduced energy efficiency… ▽ More In recent years, orthogonal chirp division modulation (OCDM) has gained attention as a robust communication waveform due to its strong resistance to both time-domain and frequency-domain interference. However, similar to orthogonal frequency division multiplexing (OFDM), OCDM suffers from a high peak-to-average power ratio (PAPR), resulting in increased hardware costs and reduced energy efficiency of the transmitter's power amplifiers. In this work, we introduce a novel unitary transform called the Generalized Discrete Fresnel Transform (GDFnT) and propose a new waveform based on this transform, named Generalized Orthogonal Chirp Division Modulation (GOCDM). In GOCDM, data symbols from the constellation diagram are independently placed in the Generalized Fresnel (GF) domain. We derive the GF-domain channel matrix for the GOCDM system under time-frequency doubly selective channels and leverages the sparsity of the GF-domain channel matrix to design an iterative receiver based on the message-passing algorithm. Simulation results demonstrate that GOCDM achieves better PAPR performance than OCDM without compromising bit error rate (BER) performance. △ Less

Submitted 31 August, 2024; originally announced September 2024.

arXiv:2408.17372 [pdf, ps, other]

Partial Blow-up Phenomena in the $SU(3)$ Toda System on Riemann Surfaces

Authors: Zhengni Hu, Mohameden Ahmedou, Thomas Bartsch

Abstract: This work studies the partial blow-up phenomena for the $SU(3)$ Toda system on compact Riemann surfaces with smooth boundary. We consider the following coupled Liouville system with Neumann boundary conditions:… ▽ More This work studies the partial blow-up phenomena for the $SU(3)$ Toda system on compact Riemann surfaces with smooth boundary. We consider the following coupled Liouville system with Neumann boundary conditions: $$ -Δ_g u_1 = 2ρ_1\left( \frac{V_1 e^{u_1}}{\int_Σ V_1 e^{u_1} \, dv_g} - \frac 1 {|Σ|_g}\right) - ρ_2\left( \frac{V_2 e^{u_2}}{\int_Σ V_2 e^{u_2} \, dv_g} - \frac{1}{|Σ|_g}\right) \text{in} \,\mathringΣ$$ and $$ -Δ_g u_2 = 2ρ_2\left( \frac{V_2 e^{u_2}}{\int_Σ V_2 e^{u_2} \, dv_g} - \frac{1}{|Σ|_g}\right) - ρ_1\left( \frac{V_1 e^{u_1}}{\int_Σ V_1 e^{u_1} \, dv_g} - \frac{1}{|Σ|_g}\right) \text{in} \,\mathringΣ$$ with boundary conditions $ \partial_{ν_g} u_1 = \partial_{ν_g} u_2 = 0 \text{ on} \, \partial Σ,$ where $(Σ, g)$ is a compact Riemann surface with the interior $\mathringΣ$ and smooth boundary $\partialΣ$, $ρ_i$ is a non-negative parameter and $V_i$ is a smooth positive function for $i=1,2$. We construct a family of blow-up solutions via the Lyapunov-Schmidt reduction and variational methods, wherein one component remains uniformly bounded from above, while the other exhibits partial blow-ups at a prescribed number of points, both in the interior and on the boundary. This construction is based on the existence of a non-degeneracy solution of a so-called shadow system. Moreover, we establish the existence of partial blow-up solutions in three cases: (i) for any $ρ_2>0$ sufficiently small; (ii) for generic $V_1, V_2$ and any $ρ_2\in (0,2π)$; (iii) for generic $V_1, V_2$, the Euler characteristic $χ(Σ)<1$ and any $ρ_2\in (2π,+\infty)\setminus 2π\mathbb{N}_+$. △ Less

Submitted 30 August, 2024; originally announced August 2024.

MSC Class: 35J57; 58J05

arXiv:2408.16917 [pdf, ps, other]

Blow-up solutions for mean field equations with Neumann boundary conditions on Riemann surfaces

Authors: Zhengni Hu, Thomas Bartsch, Mohameden Ahmedou

Abstract: On a compact Riemann surface $(Σ, g)$ with a smooth boundary $\partial Σ$, we consider the following mean field equations with Neumann boundary conditions: $$ -Δ_g u = λ\left(\frac{Ve^u}{\int_Σ Ve^u \, dv_g} - \frac{1}{|Σ|_g}\right) \text{ in } Σ\text{ with } \partial_{ν_g} u = 0 \text{ on } \partial Σ, $$ We find conditions on the potential function $V: Σ\to \mathbb{R}^+$ such that solutions exis… ▽ More On a compact Riemann surface $(Σ, g)$ with a smooth boundary $\partial Σ$, we consider the following mean field equations with Neumann boundary conditions: $$ -Δ_g u = λ\left(\frac{Ve^u}{\int_Σ Ve^u \, dv_g} - \frac{1}{|Σ|_g}\right) \text{ in } Σ\text{ with } \partial_{ν_g} u = 0 \text{ on } \partial Σ, $$ We find conditions on the potential function $V: Σ\to \mathbb{R}^+$ such that solutions exist for the parameter $λ$ when it is in a small right (or left) neighborhood of a critical value $4π(m+k)$ for $k \leq m \in \mathbb{N}_+$ and blow up as $λ$ approaches the critical parameter. The blow-up occurs exactly at $k$ points in the interior of $Σ$ and $(m-k)$ points on the boundary $\partial Σ$. △ Less

Submitted 29 August, 2024; originally announced August 2024.

MSC Class: 35B33 (Primary) 35J61; 35R01 (Secondary)

arXiv:2408.14979 [pdf, ps, other]

On Solutions for Singular Toda System on Riemann Surfaces with Boundary

Authors: Zhengni Hu

Abstract: This paper studies solutions to a singular $SU(3)$ Toda system with linear source terms on a compact Riemann surface $Σ$ with smooth boundaries $\partialΣ$. We establish the existence of solutions when the parameters are not critical, assuming that Euler characteristic $χ(Σ)<1$ via analyzing the sublevels. Furthermore, we find a sufficient condition that ensures multiple solutions for generic pote… ▽ More This paper studies solutions to a singular $SU(3)$ Toda system with linear source terms on a compact Riemann surface $Σ$ with smooth boundaries $\partialΣ$. We establish the existence of solutions when the parameters are not critical, assuming that Euler characteristic $χ(Σ)<1$ via analyzing the sublevels. Furthermore, we find a sufficient condition that ensures multiple solutions for generic potentials by Morse inequalities and a transversality theorem. △ Less

Submitted 27 August, 2024; originally announced August 2024.

MSC Class: 35J50 (Primary) 35J61; 35R01; 58J32 (Secondary)

arXiv:2408.14843 [pdf, other]

Correntropy-Based Improper Likelihood Model for Robust Electrophysiological Source Imaging

Authors: Yuanhao Li, Badong Chen, Zhongxu Hu, Keita Suzuki, Wenjun Bai, Yasuharu Koike, Okito Yamashita

Abstract: Bayesian learning provides a unified skeleton to solve the electrophysiological source imaging task. From this perspective, existing source imaging algorithms utilize the Gaussian assumption for the observation noise to build the likelihood function for Bayesian inference. However, the electromagnetic measurements of brain activity are usually affected by miscellaneous artifacts, leading to a pote… ▽ More Bayesian learning provides a unified skeleton to solve the electrophysiological source imaging task. From this perspective, existing source imaging algorithms utilize the Gaussian assumption for the observation noise to build the likelihood function for Bayesian inference. However, the electromagnetic measurements of brain activity are usually affected by miscellaneous artifacts, leading to a potentially non-Gaussian distribution for the observation noise. Hence the conventional Gaussian likelihood model is a suboptimal choice for the real-world source imaging task. In this study, we aim to solve this problem by proposing a new likelihood model which is robust with respect to non-Gaussian noises. Motivated by the robust maximum correntropy criterion, we propose a new improper distribution model concerning the noise assumption. This new noise distribution is leveraged to structure a robust likelihood function and integrated with hierarchical prior distributions to estimate source activities by variational inference. In particular, the score matching is adopted to determine the hyperparameters for the improper likelihood model. A comprehensive performance evaluation is performed to compare the proposed noise assumption to the conventional Gaussian model. Simulation results show that, the proposed method can realize more precise source reconstruction by designing known ground-truth. The real-world dataset also demonstrates the superiority of our new method with the visual perception task. This study provides a new backbone for Bayesian source imaging, which would facilitate its application using real-world noisy brain signal. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.14492 [pdf, other]

Evolvable Psychology Informed Neural Network for Memory Behavior Modeling

Authors: Xiaoxuan Shen, Zhihai Hu, Qirong Chen, Shengyingjie Liu, Ruxia Liang, Jianwen Sun

Abstract: Memory behavior modeling is a core issue in cognitive psychology and education. Classical psychological theories typically use memory equations to describe memory behavior, which exhibits insufficient accuracy and controversy, while data-driven memory modeling methods often require large amounts of training data and lack interpretability. Knowledge-informed neural network models have shown excelle… ▽ More Memory behavior modeling is a core issue in cognitive psychology and education. Classical psychological theories typically use memory equations to describe memory behavior, which exhibits insufficient accuracy and controversy, while data-driven memory modeling methods often require large amounts of training data and lack interpretability. Knowledge-informed neural network models have shown excellent performance in fields like physics, but there have been few attempts in the domain of behavior modeling. This paper proposed a psychology theory informed neural networks for memory behavior modeling named PsyINN, where it constructs a framework that combines neural network with differentiating sparse regression, achieving joint optimization. Specifically, to address the controversies and ambiguity of descriptors in memory equations, a descriptor evolution method based on differentiating operators is proposed to achieve precise characterization of descriptors and the evolution of memory theoretical equations. Additionally, a buffering mechanism for the sparse regression and a multi-module alternating iterative optimization method are proposed, effectively mitigating gradient instability and local optima issues. On four large-scale real-world memory behavior datasets, the proposed method surpasses the state-of-the-art methods in prediction accuracy. Ablation study demonstrates the effectiveness of the proposed refinements, and application experiments showcase its potential in inspiring psychological research. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.13914 [pdf, ps, other]

Data-driven approximate output regulation of nonlinear systems

Authors: Zhongjie Hu, Claudio De Persis, Pietro Tesi

Abstract: The paper deals with the data-based design of controllers that solve the output regulation problem for nonlinear systems. Inspired by recent developments in model-based output regulation design techniques and in data-driven control design for nonlinear systems, we derive a data-dependent semidefinite program that, when solved, directly returns a controller that approximately regulates the tracking… ▽ More The paper deals with the data-based design of controllers that solve the output regulation problem for nonlinear systems. Inspired by recent developments in model-based output regulation design techniques and in data-driven control design for nonlinear systems, we derive a data-dependent semidefinite program that, when solved, directly returns a controller that approximately regulates the tracking error to zero. When specialized to the case of linear systems, the result appears to improve upon existing work. Numerical results illustrate the findings. △ Less

Submitted 25 August, 2024; originally announced August 2024.

arXiv:2408.12615 [pdf, other]

Pediatric TSC-Related Epilepsy Classification from Clinical MR Images Using Quantum Neural Network

Authors: Ling Lin, Yihang Zhou, Zhanqi Hu, Dian Jiang, Congcong Liu, Shuo Zhou, Yanjie Zhu, Jianxiang Liao, Dong Liang, Hairong Zheng, Haifeng Wang

Abstract: Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-lay… ▽ More Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-layer quantum layer (QL), comprising ZZFeatureMap and Ansatz layers, strategically designed for processing classical data within a quantum framework. A comprehensive evaluation, demonstrates the superior performance of QResNet in TSC MRI image classification compared to conventional 3D-ResNet models. These compelling findings underscore the potential of quantum computing to revolutionize medical imaging and diagnostics.Remarkably, this method surpasses conventional CNNs in accuracy and Area Under the Curve (AUC) metrics with the current dataset. Future research endeavors may focus on exploring the scalability and practical implementation of quantum algorithms in real-world medical imaging scenarios. △ Less

Submitted 26 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

Comments: 5 pages,4 figures,2 tables,presented at ISBI 2024

arXiv:2408.12320 [pdf, other]

PolyRouter: A Multi-LLM Querying System

Authors: Dimitris Stripelis, Zijian Hu, Jipeng Zhang, Zhaozhuo Xu, Alay Dilipbhai Shah, Han Jin, Yuhang Yao, Salman Avestimehr, Chaoyang He

Abstract: With the rapid growth of Large Language Models (LLMs) across various domains, numerous new LLMs have emerged, each possessing domain-specific expertise. This proliferation has highlighted the need for quick, high-quality, and cost-effective LLM query response methods. Yet, no single LLM exists to efficiently balance this trilemma. Some models are powerful but extremely costly, while others are fas… ▽ More With the rapid growth of Large Language Models (LLMs) across various domains, numerous new LLMs have emerged, each possessing domain-specific expertise. This proliferation has highlighted the need for quick, high-quality, and cost-effective LLM query response methods. Yet, no single LLM exists to efficiently balance this trilemma. Some models are powerful but extremely costly, while others are fast and inexpensive but qualitatively inferior. To address this challenge, we present PolyRouter, a non-monolithic LLM querying system that seamlessly integrates various LLM experts into a single query interface and dynamically routes incoming queries to the most high-performant expert based on query's requirements. Through extensive experiments, we demonstrate that when compared to standalone expert models, PolyRouter improves query efficiency by up to 40%, and leads to significant cost reductions of up to 30%, while maintaining or enhancing model performance by up to 10%. △ Less

Submitted 26 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

Comments: 14 pages, 7 figures, 2 tables

ACM Class: I.2; I.5

arXiv:2408.10635 [pdf, other]

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

Authors: Jonathan Light, Min Cai, Weiqin Chen, Guanzhi Wang, Xiusi Chen, Wei Cheng, Yisong Yue, Ziniu Hu

Abstract: In this paper, we propose a new method Strategist that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, which can then be used to learn high-level strategic skills such as how to evaluate states that guide the low-level execut… ▽ More In this paper, we propose a new method Strategist that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, which can then be used to learn high-level strategic skills such as how to evaluate states that guide the low-level execution.We showcase how our method can be used in both action planning and dialogue generation in the context of games, achieving good performance on both tasks. Specifically, we demonstrate that our method can help train agents with better performance than both traditional reinforcement learning-based approaches and other LLM-based skill learning approaches in games including the Game of Pure Strategy (GOPS) and The Resistance: Avalon. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: website: https://llm-strategist.github.io

arXiv:2408.09988 [pdf, other]

Chiral-odd gluon generalized parton distributions in the proton: A light-front quantization approach

Authors: Bolang Lin, Sreeraj Nair, Chandan Mondal, Siqi Xu, Zhi Hu, Pengxiang Zhang, Xingbo Zhao, James P. Vary

Abstract: Within the basis light-front quantization (BLFQ) framework, we evaluate the gluon chiral-odd generalized parton distributions (GPDs) inside the proton at zero skewness. We employ the light-front wave functions of the proton obtained from a light-front quantized Hamiltonian with quantum chromodynamics input using BLFQ. Our investigation encompasses both the valence Fock sector with three constituen… ▽ More Within the basis light-front quantization (BLFQ) framework, we evaluate the gluon chiral-odd generalized parton distributions (GPDs) inside the proton at zero skewness. We employ the light-front wave functions of the proton obtained from a light-front quantized Hamiltonian with quantum chromodynamics input using BLFQ. Our investigation encompasses both the valence Fock sector with three constituent quarks and an additional sector containing three quarks and a dynamical gluon. We analyze the gluon GPDs in the momentum space as well as in the transverse position space. We further present the gluon's generalized form factors derived from the Mellin moments of its chiral-odd GPDs. Using the proton transverse spin sum rule, we also present the $x$-dependence of the angular momentum carried by the polarized gluon and determine the relative contributions of quarks and the gluon to the transversity asymmetry. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 11 pages, 12 figures

arXiv:2408.09088 [pdf]

Quantum encryption design overcomes Shannon's theorem to achieve perfect secrecy with reusable keys

Authors: Zixuan Hu, Zhenyu Li

Abstract: Shannon's perfect-secrecy theorem states that a perfect encryption system that yields zero information to the adversary must be a one-time pad (OTP) with the keys randomly generated and never reused. However, recently discovered exotic properties of quantum entanglement have motivated us to reconsider Shannon's theorem in the quantum regime. In this work we design a quantum encryption method that… ▽ More Shannon's perfect-secrecy theorem states that a perfect encryption system that yields zero information to the adversary must be a one-time pad (OTP) with the keys randomly generated and never reused. However, recently discovered exotic properties of quantum entanglement have motivated us to reconsider Shannon's theorem in the quantum regime. In this work we design a quantum encryption method that overcomes Shannon's theorem to achieve perfect secrecy with reusable keys. The mechanism used is fundamentally quantum, demonstrating subtle but critical differences in how information is processed in quantum versus classical systems. △ Less

Submitted 26 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

Comments: This revision added a worked example and a quantum circuit figure

arXiv:2408.07611 [pdf, other]

WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs

Authors: Weijian Xie, Xuefeng Liang, Yuhui Liu, Kaihua Ni, Hong Cheng, Zetian Hu

Abstract: Large Language Models (LLMs) have greatly contributed to the development of adaptive intelligent agents and are positioned as an important way to achieve Artificial General Intelligence (AGI). However, LLMs are prone to produce factually incorrect information and often produce "phantom" content that undermines their reliability, which poses a serious challenge for their deployment in real-world sc… ▽ More Large Language Models (LLMs) have greatly contributed to the development of adaptive intelligent agents and are positioned as an important way to achieve Artificial General Intelligence (AGI). However, LLMs are prone to produce factually incorrect information and often produce "phantom" content that undermines their reliability, which poses a serious challenge for their deployment in real-world scenarios. Enhancing LLMs by combining external databases and information retrieval mechanisms is an effective path. To address the above challenges, we propose a new approach called WeKnow-RAG, which integrates Web search and Knowledge Graphs into a "Retrieval-Augmented Generation (RAG)" system. First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval. WeKnow-RAG then utilizes domain-specific knowledge graphs to satisfy a variety of queries and domains, thereby improving performance on factual information and complex reasoning tasks by employing multi-stage web page retrieval techniques using both sparse and dense retrieval methods. Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process. Finally, we also integrate a self-assessment mechanism for the LLM to evaluate the trustworthiness of the answers it generates. Our approach proves its outstanding effectiveness in a wide range of offline experiments and online submissions. △ Less

Submitted 27 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

Comments: 8 pages, 2 figures, technical report for 3rd place in Task 3 of Meta KDD Cup 2024 CRAG Challenge

arXiv:2408.07482 [pdf, other]

Training Overhead Ratio: A Practical Reliability Metric for Large Language Model Training Systems

Authors: Ning Lu, Qian Xie, Hao Zhang, Wenyi Fang, Yang Zheng, Zheng Hu, Jiantao Ma

Abstract: Large Language Models (LLMs) are revolutionizing the AI industry with their superior capabilities. Training these models requires large-scale GPU clusters and significant computing time, leading to frequent failures that significantly increase training costs. Despite its significance, this field lacks a metric for evaluating reliability. In this work, we introduce a novel reliability metric called… ▽ More Large Language Models (LLMs) are revolutionizing the AI industry with their superior capabilities. Training these models requires large-scale GPU clusters and significant computing time, leading to frequent failures that significantly increase training costs. Despite its significance, this field lacks a metric for evaluating reliability. In this work, we introduce a novel reliability metric called \emph{Training Overhead Ratio} (TOR) to evaluate the reliability of fault-tolerant LLM training systems. TOR is defined as the ratio of optimal training time to the observed training time of a system, serving as a practical tool for users to estimate the actual time required to train an LLM on a given system. Furthermore, our investigation identifies the key factor for enhancing reliability and present TOR equations for various types of failures encountered in practice. △ Less

Submitted 5 September, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

Comments: To be published in: IEEE International Symposium on Software Reliability Engineering (ISSRE2024) workshop

arXiv:2408.07444 [pdf, other]

Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

Authors: Senmao Wang, Haifan Gong, Runmeng Cui, Boyao Wan, Yicheng Liu, Zhonglin Hu, Haiqing Yang, Jingyang Zhou, Bo Pan, Lin Lin, Haiyue Jiang

Abstract: Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range co… ▽ More Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range costal cartilage relationships. Our method leverages a deformable model that integrates topological priors to enhance the adaptability and accuracy of the segmentation process. Furthermore, we developed a comprehensive benchmark that contains 165 cases for costal cartilage segmentation. This benchmark sets a new standard for evaluating costal cartilage segmentation techniques and provides a valuable resource for future research. Extensive experiments conducted on both in-domain benchmarks and out-of domain test sets demonstrate the superiority of our approach over existing methods, showing significant improvements in segmentation precision and robustness. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.06000 [pdf, other]

An Analysis for Image-to-Image Translation and Style Transfer

Authors: Xiaoming Yu, Jie Tian, Zhenhua Hu

Abstract: With the development of generative technologies in deep learning, a large number of image-to-image translation and style transfer models have emerged at an explosive rate in recent years. These two technologies have made significant progress and can generate realistic images. However, many communities tend to confuse the two, because both generate the desired image based on the input image and bot… ▽ More With the development of generative technologies in deep learning, a large number of image-to-image translation and style transfer models have emerged at an explosive rate in recent years. These two technologies have made significant progress and can generate realistic images. However, many communities tend to confuse the two, because both generate the desired image based on the input image and both cover the two definitions of content and style. In fact, there are indeed significant differences between the two, and there is currently a lack of clear explanations to distinguish the two technologies, which is not conducive to the advancement of technology. We hope to serve the entire community by introducing the differences and connections between image-to-image translation and style transfer. The entire discussion process involves the concepts, forms, training modes, evaluation processes, and visualization results of the two technologies. Finally, we conclude that image-to-image translation divides images by domain, and the types of images in the domain are limited, and the scope involved is small, but the conversion ability is strong and can achieve strong semantic changes. Style transfer divides image types by single image, and the scope involved is large, but the transfer ability is limited, and it transfers more texture and color of the image. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.05099 [pdf]

Lithography-free patterning of chalcogenide materials for integrated photonic devices

Authors: Zhen Hu, Yuru Li, Yan Li, Shunyu Yao, Hongfei Chen, Tao Zhang, Zhaohuan Ao, Zhaohui Li

Abstract: Chalcogenide material-based integrated photonic devices have garnered widespread attention due to their unique wideband transparency. Despite their recognized CMOS compatibility, the fabrication of these devices relies predominantly on lithography techniques. However, chalcogenide thin films are highly susceptible to oxidation, necessitating customized process flows and complex protective measures… ▽ More Chalcogenide material-based integrated photonic devices have garnered widespread attention due to their unique wideband transparency. Despite their recognized CMOS compatibility, the fabrication of these devices relies predominantly on lithography techniques. However, chalcogenide thin films are highly susceptible to oxidation, necessitating customized process flows and complex protective measures during lithography. These requirements are hardly compatible with current commercial CMOS manufacturing platforms designed for silicon photonics, significantly limiting the practical applications of chalcogenide photonic devices. In this work, we ingeniously exploit the ease of oxidation of chalcogenide materials, presenting a novel laser-induced localized oxidation technique for spatial patterning on chalcogenide thin films, enabling concise lithography-free fabrication of chalcogenide integrated photonic devices. Using Sb2S3 as an example, we experimentally demonstrate localized multi-level oxidation with a sizable overall refractive index contrast of 0.7 at near-infrared, featuring a high spatial resolution of 0.6 um. Based on this technique, multiple integrated photonic devices are demonstrated, showing versatile functionalities, including color printing at visible and metasurface-based spatial light modulation at near-infrared regions. Leveraging the inherent phase-change property of Sb2S3, an active Fresnel zone plate, enabling switchable beam focusing, is further demonstrated, indicating the feasibility of concise fabrication of active photonic devices. Our work offers a brand-new modulation dimension for chalcogenide materials and provides a significantly simplified approach for realizing chalcogenide-integrated photonic devices. △ Less

Submitted 9 August, 2024; originally announced August 2024.

arXiv:2408.04425 [pdf, other]

Effects from Dark Matter Halos on X-ray Pulsar Pulse Profiles

Authors: Yukun Liu, Hong-Bo Li, Yong Gao, Lijing Shao, Zexin Hu

Abstract: Neutron stars (NSs) can capture dark matter (DM) particles because of their deep gravitational potential and high density. The accumulated DM can affect the properties of NSs. In this work we use a general relativistic two-fluid formalism to solve the structure of DM-admixed NSs (DANSs) and the surrounding spacetime. Specifically, we pay attention to the situation where those DANSs possess DM halo… ▽ More Neutron stars (NSs) can capture dark matter (DM) particles because of their deep gravitational potential and high density. The accumulated DM can affect the properties of NSs. In this work we use a general relativistic two-fluid formalism to solve the structure of DM-admixed NSs (DANSs) and the surrounding spacetime. Specifically, we pay attention to the situation where those DANSs possess DM halos. Due to the gravitational effect of the DM halo, the pulse profile of an X-ray pulsar is changed. Our study finds a universal relation between the peak flux deviation of the pulse profile and $M_{\rm halo}/R_{\rm BM}$, which is the ratio of the DM halo mass, $M_{\rm halo}$, to the baryonic matter (BM) core radius, $R_{\rm BM}$. Our results show that, when $M_{\rm halo}/R_{\rm BM}=0.292$ and the DM particle mass $m_f = 0.3\,$GeV, the maximum deviation of the profile can be larger than 100$\%$, which has implication in X-ray pulsar observation. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: 9 pages, 11 figures

arXiv:2408.04068 [pdf, other]

doi 10.24963/ijcai.2024/1031

Digital Avatars: Framework Development and Their Evaluation

Authors: Timothy Rupprecht, Sung-En Chang, Yushu Wu, Lei Lu, Enfu Nan, Chih-hsiang Li, Caiyue Lai, Zhimin Li, Zhijun Hu, Yumei He, David Kaeli, Yanzhi Wang

Abstract: We present a novel prompting strategy for artificial intelligence driven digital avatars. To better quantify how our prompting strategy affects anthropomorphic features like humor, authenticity, and favorability we present Crowd Vote - an adaptation of Crowd Score that allows for judges to elect a large language model (LLM) candidate over competitors answering the same or similar prompts. To visua… ▽ More We present a novel prompting strategy for artificial intelligence driven digital avatars. To better quantify how our prompting strategy affects anthropomorphic features like humor, authenticity, and favorability we present Crowd Vote - an adaptation of Crowd Score that allows for judges to elect a large language model (LLM) candidate over competitors answering the same or similar prompts. To visualize the responses of our LLM, and the effectiveness of our prompting strategy we propose an end-to-end framework for creating high-fidelity artificial intelligence (AI) driven digital avatars. This pipeline effectively captures an individual's essence for interaction and our streaming algorithm delivers a high-quality digital avatar with real-time audio-video streaming from server to mobile device. Both our visualization tool, and our Crowd Vote metrics demonstrate our AI driven digital avatars have state-of-the-art humor, authenticity, and favorability outperforming all competitors and baselines. In the case of our Donald Trump and Joe Biden avatars, their authenticity and favorability are rated higher than even their real-world equivalents. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: This work was presented during the IJCAI 2024 conference proceedings for demonstrations

MSC Class: 68 ACM Class: D.2.2; C.3

Journal ref: 2024 Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence Demo Track. Pages 8780-8783

arXiv:2408.03910 [pdf, other]

CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases

Authors: Xiangyan Liu, Bo Lan, Zhiyuan Hu, Yang Liu, Zhicheng Zhang, Fei Wang, Michael Shieh, Wenmeng Zhou

Abstract: Large Language Models (LLMs) excel in stand-alone code tasks like HumanEval and MBPP, but struggle with handling entire code repositories. This challenge has prompted research on enhancing LLM-codebase interaction at a repository scale. Current solutions rely on similarity-based retrieval or manual tools and APIs, each with notable drawbacks. Similarity-based retrieval often has low recall in comp… ▽ More Large Language Models (LLMs) excel in stand-alone code tasks like HumanEval and MBPP, but struggle with handling entire code repositories. This challenge has prompted research on enhancing LLM-codebase interaction at a repository scale. Current solutions rely on similarity-based retrieval or manual tools and APIs, each with notable drawbacks. Similarity-based retrieval often has low recall in complex tasks, while manual tools and APIs are typically task-specific and require expert knowledge, reducing their generalizability across diverse code tasks and real-world applications. To mitigate these limitations, we introduce CodexGraph, a system that integrates LLM agents with graph database interfaces extracted from code repositories. By leveraging the structural properties of graph databases and the flexibility of the graph query language, CodexGraph enables the LLM agent to construct and execute queries, allowing for precise, code structure-aware context retrieval and code navigation. We assess CodexGraph using three benchmarks: CrossCodeEval, SWE-bench, and EvoCodeBench. Additionally, we develop five real-world coding applications. With a unified graph database schema, CodexGraph demonstrates competitive performance and potential in both academic and real-world environments, showcasing its versatility and efficacy in software engineering. Our application demo: https://github.com/modelscope/modelscope-agent/tree/master/apps/codexgraph_agent. △ Less

Submitted 11 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

Comments: work in progress

arXiv:2408.01696 [pdf, other]

Generating High-quality Symbolic Music Using Fine-grained Discriminators

Authors: Zhedong Zhang, Liang Li, Jiehua Zhang, Zhenghui Hu, Hongkui Wang, Chenggang Yan, Jian Yang, Yuankai Qi

Abstract: Existing symbolic music generation methods usually utilize discriminator to improve the quality of generated music via global perception of music. However, considering the complexity of information in music, such as rhythm and melody, a single discriminator cannot fully reflect the differences in these two primary dimensions of music. In this work, we propose to decouple the melody and rhythm from… ▽ More Existing symbolic music generation methods usually utilize discriminator to improve the quality of generated music via global perception of music. However, considering the complexity of information in music, such as rhythm and melody, a single discriminator cannot fully reflect the differences in these two primary dimensions of music. In this work, we propose to decouple the melody and rhythm from music, and design corresponding fine-grained discriminators to tackle the aforementioned issues. Specifically, equipped with a pitch augmentation strategy, the melody discriminator discerns the melody variations presented by the generated samples. By contrast, the rhythm discriminator, enhanced with bar-level relative positional encoding, focuses on the velocity of generated notes. Such a design allows the generator to be more explicitly aware of which aspects should be adjusted in the generated music, making it easier to mimic human-composed music. Experimental results on the POP909 benchmark demonstrate the favorable performance of the proposed method compared to several state-of-the-art methods in terms of both objective and subjective metrics. △ Less

Submitted 3 August, 2024; originally announced August 2024.

Comments: Accepted by ICPR2024

arXiv:2408.00491 [pdf, other]

doi 10.1145/3664647.3681656

GalleryGPT: Analyzing Paintings with Large Multimodal Models

Authors: Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

Abstract: Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Understanding artworks is challenging due to its subjective nature, diverse interpretations, and complex visual elements, requiring expertise in art history, cultural background, and aesthetic theory. However, limited by the data… ▽ More Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Understanding artworks is challenging due to its subjective nature, diverse interpretations, and complex visual elements, requiring expertise in art history, cultural background, and aesthetic theory. However, limited by the data collection and model ability, previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI. To facilitate the research progress, in this paper, we step further to compose comprehensive analysis inspired by the remarkable perception and generation ability of large multimodal models. Specifically, we first propose a task of composing paragraph analysis for artworks, i.e., painting in this paper, only focusing on visual characteristics to formulate more comprehensive understanding of artworks. To support the research on formal analysis, we collect a large dataset PaintingForm, with about 19k painting images and 50k analysis paragraphs. We further introduce a superior large multimodal model for painting analysis composing, dubbed GalleryGPT, which is slightly modified and fine-tuned based on LLaVA architecture leveraging our collected data. We conduct formal analysis generation and zero-shot experiments across several datasets to assess the capacity of our model. The results show remarkable performance improvements comparing with powerful baseline LMMs, demonstrating its superb ability of art analysis and generalization. \textcolor{blue}{The codes and model are available at: https://github.com/steven640pixel/GalleryGPT. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: Accepted as Oral Presentation at ACM Multimedia 2024

arXiv:2408.00245 [pdf, other]

Measuring the Spin of the Galactic Center Supermassive Black Hole with Two Pulsars

Authors: Zexin Hu, Lijing Shao

Abstract: As a key science project of the Square Kilometre Array (SKA), the discovery and timing observations of radio pulsars in the Galactic Center would provide high-precision measurements of the spacetime around the supermassive black hole, Sagittarius A* (Sgr A*), and initiate novel tests of general relativity. The spin of Sgr A* could be measured with a relative error of $\lesssim 1\%$ by timing one p… ▽ More As a key science project of the Square Kilometre Array (SKA), the discovery and timing observations of radio pulsars in the Galactic Center would provide high-precision measurements of the spacetime around the supermassive black hole, Sagittarius A* (Sgr A*), and initiate novel tests of general relativity. The spin of Sgr A* could be measured with a relative error of $\lesssim 1\%$ by timing one pulsar with timing precision that is achievable for the SKA. However, the real measurements depend on the discovery of a pulsar in a very compact orbit, $P_b\lesssim0.5\,{\rm yr}$. Here for the first time we propose and investigate the possibility of probing the spin of Sgr A* with two or more pulsars that are in orbits with larger orbital periods, $P_b\sim 2- 5\,{\rm yr}$, which represents a more realistic situation from population estimates. We develop a novel method for directly determining the spin of Sgr A* from the timing observables of two pulsars and it can be readily extended for combining more pulsars. With extensive mock data simulations, we show that combining a second pulsar improves the spin measurement by $2-3$ orders of magnitude in some situations, which is comparable to timing a pulsar in a very tight orbit. △ Less

Submitted 31 July, 2024; originally announced August 2024.

Comments: 5 pages, 3 figures

arXiv:2408.00008 [pdf, other]

ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency

Authors: Yuhang Yao, Han Jin, Alay Dilipbhai Shah, Shanshan Han, Zijian Hu, Yide Ran, Dimitris Stripelis, Zhaozhuo Xu, Salman Avestimehr, Chaoyang He

Abstract: Large language models (LLMs) have surged in popularity and are extensively used in commercial applications, where the efficiency of model serving is crucial for the user experience. Most current research focuses on optimizing individual sub-procedures, e.g. local inference and communication, however, there is no comprehensive framework that provides a holistic system view for optimizing LLM servin… ▽ More Large language models (LLMs) have surged in popularity and are extensively used in commercial applications, where the efficiency of model serving is crucial for the user experience. Most current research focuses on optimizing individual sub-procedures, e.g. local inference and communication, however, there is no comprehensive framework that provides a holistic system view for optimizing LLM serving in an end-to-end manner. In this work, we conduct a detailed analysis to identify major bottlenecks that impact end-to-end latency in LLM serving systems. Our analysis reveals that a comprehensive LLM serving endpoint must address a series of efficiency bottlenecks that extend beyond LLM inference. We then propose ScaleLLM, an optimized system for resource-efficient LLM serving. Our extensive experiments reveal that with 64 concurrent requests, ScaleLLM achieves a 4.3x speed up over vLLM and outperforms state-of-the-arts with 1.5x higher throughput. △ Less

Submitted 10 September, 2024; v1 submitted 23 July, 2024; originally announced August 2024.

arXiv:2408.00001 [pdf, other]

Replication in Visual Diffusion Models: A Survey and Outlook

Authors: Wenhao Wang, Yifan Sun, Zongxin Yang, Zhengdong Hu, Zhentao Tan, Yi Yang

Abstract: Visual diffusion models have revolutionized the field of creative AI, producing high-quality and diverse content. However, they inevitably memorize training images or videos, subsequently replicating their concepts, content, or styles during inference. This phenomenon raises significant concerns about privacy, security, and copyright within generated outputs. In this survey, we provide the first c… ▽ More Visual diffusion models have revolutionized the field of creative AI, producing high-quality and diverse content. However, they inevitably memorize training images or videos, subsequently replicating their concepts, content, or styles during inference. This phenomenon raises significant concerns about privacy, security, and copyright within generated outputs. In this survey, we provide the first comprehensive review of replication in visual diffusion models, marking a novel contribution to the field by systematically categorizing the existing studies into unveiling, understanding, and mitigating this phenomenon. Specifically, unveiling mainly refers to the methods used to detect replication instances. Understanding involves analyzing the underlying mechanisms and factors that contribute to this phenomenon. Mitigation focuses on developing strategies to reduce or eliminate replication. Beyond these aspects, we also review papers focusing on its real-world influence. For instance, in the context of healthcare, replication is critically worrying due to privacy concerns related to patient data. Finally, the paper concludes with a discussion of the ongoing challenges, such as the difficulty in detecting and benchmarking replication, and outlines future directions including the development of more robust mitigation techniques. By synthesizing insights from diverse studies, this paper aims to equip researchers and practitioners with a deeper understanding at the intersection between AI technology and social good. We release this project at https://github.com/WangWenhao0716/Awesome-Diffusion-Replication. △ Less

Submitted 7 July, 2024; originally announced August 2024.

Comments: The first survey focuses on replication in visual diffusion models. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2407.20299 [pdf, other]

Dataset Distillation for Offline Reinforcement Learning

Authors: Jonathan Light, Yuanzhe Liu, Ziniu Hu

Abstract: Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We… ▽ More Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We show that our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset or a model trained using percentile behavioral cloning. Our project site is available at $\href{https://datasetdistillation4rl.github.io}{\text{here}}$. We also provide our implementation at $\href{https://github.com/ggflow123/DDRL}{\text{this GitHub repository}}$. △ Less

Submitted 31 July, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

Comments: ICML 2024 DMLR Workshop

Showing 1–50 of 1,655 results for author: Hu, Z