Search | arXiv e-print repository

Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling

Authors: Margaret Li, Weijia Shi, Artidoro Pagnoni, Peter West, Ari Holtzman

Abstract: RLHF-aligned LMs have shown unprecedented ability on both benchmarks and long-form text generation, yet they struggle with one foundational task: next-token prediction. As RLHF models become agent models aimed at interacting with humans, they seem to lose their world modeling -- the ability to predict what comes next in arbitrary documents, which is the foundational training objective of the Base… ▽ More RLHF-aligned LMs have shown unprecedented ability on both benchmarks and long-form text generation, yet they struggle with one foundational task: next-token prediction. As RLHF models become agent models aimed at interacting with humans, they seem to lose their world modeling -- the ability to predict what comes next in arbitrary documents, which is the foundational training objective of the Base LMs that RLHF adapts. Besides empirically demonstrating this trade-off, we propose a potential explanation: to perform coherent long-form generation, RLHF models restrict randomness via implicit blueprints. In particular, RLHF models concentrate probability on sets of anchor spans that co-occur across multiple generations for the same prompt, serving as textual scaffolding but also limiting a model's ability to generate documents that do not include these spans. We study this trade-off on the most effective current agent models, those aligned with RLHF, while exploring why this may remain a fundamental trade-off between models that act and those that predict, even as alignment techniques improve. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02022 [pdf, ps, other]

Smooth deformation limit of Moishezon manifolds is Moishezon

Authors: Mu-lin Li, Sheng Rao, Kai Wang, Meng-jiao Wang

Abstract: We prove the conjecture that the deformation limit of Moishezon manifolds under a smooth deformation over a unit disk in $\mathbb{C}$ is Moishezon. We prove the conjecture that the deformation limit of Moishezon manifolds under a smooth deformation over a unit disk in $\mathbb{C}$ is Moishezon. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: All comments are welcome

arXiv:2407.01891 [pdf, other]

Refined Motion Compensation with Soft Laser Manipulators using Data-Driven Surrogate Models

Authors: Yongjun Yan, Qingpeng Ding, Mingwu Li, Junyan Yan, Shing Shin Cheng

Abstract: Non-contact laser ablation, a precise thermal technique, simultaneously cuts and coagulates tissue without the insertion errors associated with rigid needles. Human organ motions, such as those in the liver, exhibit rhythmic components influenced by respiratory and cardiac cycles, making effective laser energy delivery to target lesions while compensating for tumor motion crucial. This research in… ▽ More Non-contact laser ablation, a precise thermal technique, simultaneously cuts and coagulates tissue without the insertion errors associated with rigid needles. Human organ motions, such as those in the liver, exhibit rhythmic components influenced by respiratory and cardiac cycles, making effective laser energy delivery to target lesions while compensating for tumor motion crucial. This research introduces a data-driven method to derive surrogate models of a soft manipulator. These low-dimensional models offer computational efficiency when integrated into the Model Predictive Control (MPC) framework, while still capturing the manipulator's dynamics with and without control input. Spectral Submanifolds (SSM) theory models the manipulator's autonomous dynamics, acknowledging its tendency to reach equilibrium when external forces are removed. Preliminary results show that the MPC controller using the surrogate model outperforms two other models within the same MPC framework. The data-driven MPC controller also supports a design-agnostic feature, allowing the interchangeability of different soft manipulators within the laser ablation surgery robot system. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01316 [pdf, other]

Evaluating Model Performance Under Worst-case Subpopulations

Authors: Mike Li, Hongseok Namkoong, Shangzhou Xia

Abstract: The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for compl… ▽ More The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for complex intersectionality in disadvantaged groups. We develop a scalable yet principled two-stage estimation procedure that can evaluate the robustness of state-of-the-art models. We prove that our procedure enjoys several finite-sample convergence guarantees, including dimension-free convergence. Instead of overly conservative notions based on Rademacher complexities, our evaluation error depends on the dimension of Z only through the out-of-sample error in estimating the performance conditional on Z. On real datasets, we demonstrate that our method certifies the robustness of a model and prevents deployment of unreliable models. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Earlier version appeared in the proceedings of Advances in Neural Information Processing Systems 34 (NeurIPS 2021): https://proceedings.neurips.cc/paper_files/paper/2021/file/908075ea2c025c335f4865f7db427062-Paper.pdf

arXiv:2407.01281 [pdf, other]

Bridging Smoothness and Approximation: Theoretical Insights into Over-Smoothing in Graph Neural Networks

Authors: Guangrui Yang, Jianfei Li, Ming Li, Han Feng, Ding-Xuan Zhou

Abstract: In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we… ▽ More In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we introduce the concept of a $K$-functional on graphs, establishing its equivalence to the modulus of smoothness. We then analyze a typical type of GCN to demonstrate how the high-frequency energy of the output decays, an indicator of over-smoothing. This analysis provides theoretical insights into the nature of over-smoothing within GCNs. Furthermore, we establish a lower bound for the approximation of target functions by GCNs, which is governed by the modulus of smoothness of these functions. This finding offers a new perspective on the approximation capabilities of GCNs. In our numerical experiments, we analyze several widely applied GCNs and observe the phenomenon of energy decay. These observations corroborate our theoretical results on exponential decay order. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00948 [pdf, other]

The House Always Wins: A Framework for Evaluating Strategic Deception in LLMs

Authors: Tanush Chopra, Michael Li

Abstract: We propose a framework for evaluating strategic deception in large language models (LLMs). In this framework, an LLM acts as a game master in two scenarios: one with random game mechanics and another where it can choose between random or deliberate actions. As an example, we use blackjack because the action space nor strategies involve deception. We benchmark Llama3-70B, GPT-4-Turbo, and Mixtral i… ▽ More We propose a framework for evaluating strategic deception in large language models (LLMs). In this framework, an LLM acts as a game master in two scenarios: one with random game mechanics and another where it can choose between random or deliberate actions. As an example, we use blackjack because the action space nor strategies involve deception. We benchmark Llama3-70B, GPT-4-Turbo, and Mixtral in blackjack, comparing outcomes against expected distributions in fair play to determine if LLMs develop strategies favoring the "house." Our findings reveal that the LLMs exhibit significant deviations from fair play when given implicit randomness instructions, suggesting a tendency towards strategic manipulation in ambiguous scenarios. However, when presented with an explicit choice, the LLMs largely adhere to fair play, indicating that the framing of instructions plays a crucial role in eliciting or mitigating potentially deceptive behaviors in AI systems. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Research conducted at the Deception Detection Hackathon 2024 hosted by Apart & Apollo Research

arXiv:2407.00421 [pdf]

Multi-wavelength switchable single-frequency hyper Raman microlasers

Authors: Chuntao Li, Ni Yao, Jintian Lin, Renhong Gao, Jianglin Guan, Guanghui Zhao, Minghui Li, Min Wang, Lingling Qiao, Ya Cheng

Abstract: Multi-wavelength switchable single-frequency microlasers in a broad spectral range are highly desirable for integrated photonic applications due to their dynamic switching functionality, narrow linewidth, and high side-mode-suppression-ratio (SMSR). Here, a strategy based on highly efficient successive excitation of different stimulated multi-photon hyper-Raman scattering (SMPHRS) processes is pro… ▽ More Multi-wavelength switchable single-frequency microlasers in a broad spectral range are highly desirable for integrated photonic applications due to their dynamic switching functionality, narrow linewidth, and high side-mode-suppression-ratio (SMSR). Here, a strategy based on highly efficient successive excitation of different stimulated multi-photon hyper-Raman scattering (SMPHRS) processes is proposed to generate multi-wavelength switchable single-frequency hyper-Raman microlasers. This is achieved through collective precise dispersion management for arranging excitation wavelengths to trigger different phase-matched SMPHRS processes in order, mode-hopping-free tuning of the pump wavelength within a wide range of 0.75 nm by leveraging strong thermo-optical broadening of ultra-high Q modes, and simultaneously suppressing harmonics generation in a lithium niobate microcavity with high second-order nonlinearity. As a result, under continuous-wave laser pump at a low level of only 3.9 mW, SMPHRS processes from two- to five-photons emerged step by step and almost depleted previously generated multi-photon Raman signal. Consequently, four-wavelength dynamically switchable single-mode lasing from near infrared (857 nm) to ultraviolet (350 nm) spanning beyond the record range (~500 nm) with high SMSRs >35 dB is reported. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 17 pages,5 figures, and 1 table

arXiv:2407.00163 [pdf]

Pressure Tuning the Mixture of Eu$^{2+}$ and Eu$^{3+}$ in Eu$_4$Bi$_6$Se$_{13}$

Authors: Mingyu Xu, Jose L. Gonzalez Jimenez, Greeshma C. Jose, Artittaya Boonkird, Chengkun Xing, Chelsea Harrod, Xinle Li, Haidong Zhou, Alyssa Gaiser, Xianglin Ke, Wenli Bi, Mingda Li, Weiwei Xie

Abstract: The investigation of crystallographic, electronic, and magnetic characteristics, especially the mixed valences of Eu$^{2+}$ and Eu$^{3+}$ under pressure of a novel europium-based bismuth selenide compound, Eu$_4$Bi$_6$Se$_{13}$, presented. This new compound adopts a monoclinic crystal structure classified under the P$2_1$/m space group (#11). It exhibits distinctive structural features, including… ▽ More The investigation of crystallographic, electronic, and magnetic characteristics, especially the mixed valences of Eu$^{2+}$ and Eu$^{3+}$ under pressure of a novel europium-based bismuth selenide compound, Eu$_4$Bi$_6$Se$_{13}$, presented. This new compound adopts a monoclinic crystal structure classified under the P$2_1$/m space group (#11). It exhibits distinctive structural features, including substantial Eu-Se coordination numbers, Bi-Se ladders, and linear chains of Eu atoms that propagate along the b-axis. Electronic resistivity assessments indicate that Eu$_{4}$Bi$_{6}$Se$_{13}$ exhibits weak metallic behaviors. Magnetic characterization reveals uniaxial magnetic anisotropy, with a notable spin transition at approximately 1.2 T when the magnetic field is oriented along the b-axis. This behavior, coupled with the specific Eu-Eu interatomic distances and the magnetic saturation observed at low fields, supports the identification of metamagnetic properties attributable to the flipping of europium spins. The Curie-Weiss analysis of the magnetic susceptibility measured both perpendicular and parallel to the b-axis and high-pressure partial fluorescence yield (PFY) results detected by X-ray absorption spectroscopy (XAS) reveal the tendency of the material to enter a mixed valent state where the trivalent state becomes more prominent with the pressure increase or temperature decrease. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 22 pages 8 figures

arXiv:2407.00136 [pdf, other]

Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components. △ Less

Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

arXiv:2407.00046 [pdf, other]

Barrier-Augmented Lagrangian for GPU-based Elastodynamic Contact

Authors: Dewen Guo, Minchen Li, Yin Yang, Guoping Wang, Sheng Li

Abstract: We propose a GPU-based iterative method for accelerated elastodynamic simulation with the log-barrier-based contact model. While Newton's method is a conventional choice for solving the interior-point system, the presence of ill-conditioned log barriers often necessitates a direct solution at each linearized substep and costs substantial storage and computational overhead. Moreover, constraint set… ▽ More We propose a GPU-based iterative method for accelerated elastodynamic simulation with the log-barrier-based contact model. While Newton's method is a conventional choice for solving the interior-point system, the presence of ill-conditioned log barriers often necessitates a direct solution at each linearized substep and costs substantial storage and computational overhead. Moreover, constraint sets that vary in each iteration present additional challenges in algorithm convergence. Our method employs a novel barrier-augmented Lagrangian method to improve system conditioning and solver efficiency by adaptively updating an augmentation constraint sets. This enables the utilization of a scalable, inexact Newton-PCG solver with sparse GPU storage, eliminating the need for direct factorization. We further enhance PCG convergence speed with a domain-decomposed warm start strategy based on an eigenvalue spectrum approximated through our in-time assembly. Demonstrating significant scalability improvements, our method makes simulations previously impractical on 128 GB of CPU memory feasible with only 8 GB of GPU memory and orders-of-magnitude faster. Additionally, our method adeptly handles stiff problems, surpassing the capabilities of existing GPU-based interior-point methods. Our results, validated across various complex collision scenarios involving intricate geometries and large deformations, highlight the exceptional performance of our approach. △ Less

Submitted 4 June, 2024; originally announced July 2024.

Comments: 17 pages, 30 figures

arXiv:2406.19987 [pdf, other]

doi 10.1109/VIS54172.2023.00053

Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs

Authors: Sangwon Jeong, Mingwei Li, Matthew Berger, Shusen Liu

Abstract: As applications of generative AI become mainstream, it is important to understand what generative models are capable of producing, and the extent to which one can predictably control their outputs. In this paper, we propose a visualization design, named Concept Lens, for jointly navigating the data distribution of a generative model, and concept manipulations supported by the model. Our work is fo… ▽ More As applications of generative AI become mainstream, it is important to understand what generative models are capable of producing, and the extent to which one can predictably control their outputs. In this paper, we propose a visualization design, named Concept Lens, for jointly navigating the data distribution of a generative model, and concept manipulations supported by the model. Our work is focused on modern vision-based generative adversarial networks (GAN), and their learned latent spaces, wherein concept discovery has gained significant interest as a means of image manipulation. Concept Lens is designed to support users in understanding the diversity of a provided set of concepts, the relationship between concepts, and the suitability of concepts to give semantic controls for image generation. Key to our approach is the hierarchical grouping of concepts, generated images, and the associated joint exploration. We show how Concept Lens can reveal consistent semantic manipulations for editing images, while also serving as a diagnostic tool for studying the limitations and trade-offs of concept discovery methods. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Journal ref: 2023 IEEE Visualization and Visual Analytics (VIS), Melbourne, Australia, 2023, pp. 221-225

arXiv:2406.19756 [pdf, other]

Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train

Authors: Haojun Jiang, Meng Li, Zhenguo Sun, Ning Jia, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

Abstract: The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-trai… ▽ More The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-training method to acquire a cardiac structure-aware world model. The core innovation lies in constructing a self-supervised task that requires structural inference by predicting masked structures on a 2D plane and imagining another plane based on pose transformation in 3D space. To support large-scale pre-training, we collected over 1.36 million echocardiograms from ten standard views, along with their 3D spatial poses. In the downstream probe guidance task, we demonstrate that our pre-trained model consistently reduces guidance errors across the ten most common standard views on the test set with 0.29 million samples from 74 routine clinical scans, indicating that structure-aware pre-training benefits the scanning. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: Technical report

arXiv:2406.19236 [pdf, other]

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

Authors: Minghan Li, Heng Li, Zhi-Qi Cheng, Yifei Dong, Yuxuan Zhou, Jun-Yan He, Qi Dai, Teruko Mitamura, Alexander G. Hauptmann

Abstract: Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions. However, current VLN frameworks often rely on static environments and optimal expert supervision, limiting their real-world applicability. To address this, we introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activitie… ▽ More Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions. However, current VLN frameworks often rely on static environments and optimal expert supervision, limiting their real-world applicability. To address this, we introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activities and relaxing key assumptions. We propose the Human-Aware 3D (HA3D) simulator, which combines dynamic human activities with the Matterport3D dataset, and the Human-Aware Room-to-Room (HA-R2R) dataset, extending R2R with human activity descriptions. To tackle HA-VLN challenges, we present the Expert-Supervised Cross-Modal (VLN-CM) and Non-Expert-Supervised Decision Transformer (VLN-DT) agents, utilizing cross-modal fusion and diverse training strategies for effective navigation in dynamic human environments. A comprehensive evaluation, including metrics considering human activities, and systematic analysis of HA-VLN's unique challenges, underscores the need for further research to enhance HA-VLN agents' real-world robustness and adaptability. Ultimately, this work provides benchmarks and insights for future research on embodied AI and Sim2Real transfer, paving the way for more realistic and applicable VLN systems in human-populated environments. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 30 pages, 18 figures, Project Page: https://lpercc.github.io/HA3D_simulator/

arXiv:2406.19190 [pdf, ps, other]

Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec… ▽ More Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 13 pages, 6 figures

arXiv:2406.18870 [pdf, ps, other]

Exact results on traces of sets

Authors: Mingze Li, Jie Ma, Mingyuan Rong

Abstract: For non-negative integers $n$, $m$, $a$ and $b$, we write $\left( n,m \right) \rightarrow \left( a,b \right)$ if for every family $\mathcal{F}\subseteq 2^{[n]}$ with $|\mathcal{F}|\geqslant m$ there is an $a$-element set $T\subseteq [n]$ such that $\left| \mathcal{F}_{\mid T} \right| \geqslant b$, where $\mathcal{F}_{\mid T}=\{ F \cap T : F \in \mathcal{F} \}$. A longstanding problem in extremal s… ▽ More For non-negative integers $n$, $m$, $a$ and $b$, we write $\left( n,m \right) \rightarrow \left( a,b \right)$ if for every family $\mathcal{F}\subseteq 2^{[n]}$ with $|\mathcal{F}|\geqslant m$ there is an $a$-element set $T\subseteq [n]$ such that $\left| \mathcal{F}_{\mid T} \right| \geqslant b$, where $\mathcal{F}_{\mid T}=\{ F \cap T : F \in \mathcal{F} \}$. A longstanding problem in extremal set theory asks to determine $m(s)=\lim_{n\rightarrow +\infty}\frac{m(n,s)}{n}$, where $m(n,s)$ denotes the maximum integer $m$ such that $\left( n,m \right) \rightarrow \left( n-1,m-s \right)$ holds for non-negatives $n$ and $s$. In this paper, we establish the exact value of $m(2^{d-1}-c)$ for all $1\leqslant c\leqslant d$ whenever $d\geqslant 50$, thereby solving an open problem posed by Piga and Schülke. To be precise, we show that $$m(n,2^{d-1}-c)=\frac{2^{d}-c}{d}n \mbox{ for } 1\leq c\leq d-1 \mbox{ and } d\mid n, \mbox{ and } m(n,2^{d-1}-d)=\frac{2^{d}-d-0.5}{d}n \mbox{ for } 2d\mid n $$ holds for $d\geq 50$. Furthermore, we provide a proof that confirms a conjecture of Frankl and Watanabe from 1994, demonstrating that $m(11)=5.3$. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18588 [pdf, other]

Varying Manifolds in Diffusion: From Time-varying Geometries to Visual Saliency

Authors: Junhao Chen, Manyi Li, Zherong Pan, Xifeng Gao, Changhe Tu

Abstract: Deep generative models learn the data distribution, which is concentrated on a low-dimensional manifold. The geometric analysis of distribution transformation provides a better understanding of data structure and enables a variety of applications. In this paper, we study the geometric properties of the diffusion model, whose forward diffusion process and reverse generation process construct a seri… ▽ More Deep generative models learn the data distribution, which is concentrated on a low-dimensional manifold. The geometric analysis of distribution transformation provides a better understanding of data structure and enables a variety of applications. In this paper, we study the geometric properties of the diffusion model, whose forward diffusion process and reverse generation process construct a series of distributions on manifolds which vary over time. Our key contribution is the introduction of generation rate, which corresponds to the local deformation of manifold over time around an image component. We show that the generation rate is highly correlated with intuitive visual properties, such as visual saliency, of the image component. Further, we propose an efficient and differentiable scheme to estimate the generation rate for a given image component over time, giving rise to a generation curve. The differentiable nature of our scheme allows us to control the shape of the generation curve via optimization. Using different loss functions, our generation curve matching algorithm provides a unified framework for a range of image manipulation tasks, including semantic transfer, object removal, saliency manipulation, image blending, etc. We conduct comprehensive analytical evaluations to support our findings and evaluate our framework on various manipulation tasks. The results show that our method consistently leads to better manipulation results, compared to recent baselines. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.18546 [pdf]

Application of Multimodal Fusion Deep Learning Model in Disease Recognition

Authors: Xiaoyi Liu, Hongjie Qiu, Muqing Li, Zhou Yu, Yutian Yang, Yafeng Yan

Abstract: This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. These drawbacks include incomplete information and limited diagnostic accuracy. During the feature extraction stage, cutting-edge deep learning models including convolutional neural networks (CNN), recurrent neural networks (RNN), and transform… ▽ More This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. These drawbacks include incomplete information and limited diagnostic accuracy. During the feature extraction stage, cutting-edge deep learning models including convolutional neural networks (CNN), recurrent neural networks (RNN), and transformers are applied to distill advanced features from image-based, temporal, and structured data sources. The fusion strategy component seeks to determine the optimal fusion mode tailored to the specific disease recognition task. In the experimental section, a comparison is made between the performance of the proposed multi-mode fusion model and existing single-mode recognition methods. The findings demonstrate significant advantages of the multimodal fusion model across multiple evaluation metrics. △ Less

Submitted 22 May, 2024; originally announced June 2024.

arXiv:2406.18311 [pdf, other]

Online Learning of Multiple Tasks and Their Relationships : Testing on Spam Email Data and EEG Signals Recorded in Construction Fields

Authors: Yixin Jin, Wenjing Zhou, Meiqi Wang, Meng Li, Xintao Li, Tianyu Hu, Xingyuan Bu

Abstract: This paper examines an online multi-task learning (OMTL) method, which processes data sequentially to predict labels across related tasks. The framework learns task weights and their relatedness concurrently. Unlike previous models that assumed static task relatedness, our approach treats tasks as initially independent, updating their relatedness iteratively using newly calculated weight vectors.… ▽ More This paper examines an online multi-task learning (OMTL) method, which processes data sequentially to predict labels across related tasks. The framework learns task weights and their relatedness concurrently. Unlike previous models that assumed static task relatedness, our approach treats tasks as initially independent, updating their relatedness iteratively using newly calculated weight vectors. We introduced three rules to update the task relatedness matrix: OMTLCOV, OMTLLOG, and OMTLVON, and compared them against a conventional method (CMTL) that uses a fixed relatedness value. Performance evaluations on three datasets a spam dataset and two EEG datasets from construction workers under varying conditions demonstrated that our OMTL methods outperform CMTL, improving accuracy by 1\% to 3\% on EEG data, and maintaining low error rates around 12\% on the spam dataset. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18183 [pdf, other]

Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of… ▽ More Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 26 pages,5 tables, 4 figures

arXiv:2406.18169 [pdf, ps, other]

Timing and Scintillation Studies of Pulsars in Globular Cluster M3 (NGC 5272) with FAST

Authors: Baoda Li, Li-yun Zhang, Jumei Yao, Dejiang Yin, Ralph P. Eatough, Minghui Li, Yifeng Li, Yujie Lian, Yu Pan, Yinfeng Dai, Yaowei Li, Xingnan Zhang, Tianhao Su, Yuxiao Wu, Tong Liu, Kuo Liu, Lin Wang, Lei Qian, Zhichen Pan

Abstract: We present the phase-connected timing solutions of all the five pulsars in globular cluster (GC) M3 (NGC 5272), namely PSRs M3A to F (PSRs J1342+2822A to F), with the exception of PSR M3C, from FAST archival data. In these timing solutions, those of PSRs M3E, and F are obtained for the first time. We find that PSRs M3E and F have low mass companions, and are in circular orbits with periods of 7.1… ▽ More We present the phase-connected timing solutions of all the five pulsars in globular cluster (GC) M3 (NGC 5272), namely PSRs M3A to F (PSRs J1342+2822A to F), with the exception of PSR M3C, from FAST archival data. In these timing solutions, those of PSRs M3E, and F are obtained for the first time. We find that PSRs M3E and F have low mass companions, and are in circular orbits with periods of 7.1 and 3.0 days, respectively. For PSR M3C, we have not detected it in all the 41 observations. We found no X-ray counterparts for these pulsars in archival Chandra images in the band of 0.2-20 keV. We noticed that the pulsars in M3 seem to be native. From the Auto-Correlation Function (ACF) analysis of the M3A's and M3B's dynamic spectra, the scintillation timescale ranges from $7.0\pm0.3$ min to $60.0\pm0.6$ min, and the scintillation bandwidth ranges from $4.6\pm0.2$ MHz to $57.1\pm1.1$ MHz. The measured scintillation bandwidths from the dynamic spectra indicate strong scintillation, and the scattering medium is anisotropic. From the secondary spectra, we captured a scintillation arc only for PSR M3B with a curvature of $649\pm23 {\rm m}^{-1} {\rm mHz}^{-2}$. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures, accepted for publication in The Astrophysical Journal

arXiv:2406.18083 [pdf, other]

Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an… ▽ More Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 19 pages, 2 figures

arXiv:2406.18063 [pdf, other]

Data-driven imaging geometric recovery of ultrahigh resolution robotic micro-CT for in-vivo and other applications

Authors: Mengzhou Li, Guibin Zan, Wenbin Yun, Josef Uher, John Wen, Ge Wang

Abstract: We introduce an ultrahigh-resolution (50μm\) robotic micro-CT design for localized imaging of carotid plaques using robotic arms, cutting-edge detector, and machine learning technologies. To combat geometric error-induced artifacts in interior CT scans, we propose a data-driven geometry estimation method that maximizes the consistency between projection data and the reprojection counterparts of a… ▽ More We introduce an ultrahigh-resolution (50μm\) robotic micro-CT design for localized imaging of carotid plaques using robotic arms, cutting-edge detector, and machine learning technologies. To combat geometric error-induced artifacts in interior CT scans, we propose a data-driven geometry estimation method that maximizes the consistency between projection data and the reprojection counterparts of a reconstructed volume. Particularly, we use a normalized cross correlation metric to overcome the projection truncation effect. Our approach is validated on a robotic CT scan of a sacrificed mouse and a micro-CT phantom scan, both producing sharper images with finer details than that prior correction. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 4-page paper for 8th International Conference on Computational and Mathematical Biomedical Engineering

arXiv:2406.17452 [pdf, ps, other]

Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (649 additional authors not shown)

Abstract: We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and… ▽ More We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17252 [pdf, other]

Resource-Optimized Grouping Shadow for Efficient Energy Estimation

Authors: Min Li, Mao Lin, Matthew J. S. Beach

Abstract: The accurate and efficient energy estimation of quantum Hamiltonians consisting of Pauli observables is an essential task in modern quantum computing. We introduce a Resource-Optimized Grouping Shadow (ROGS) algorithm, which optimally allocates measurement resources by minimizing the estimation error bound through a novel overlapped grouping strategy and convex optimization. Our numerical experime… ▽ More The accurate and efficient energy estimation of quantum Hamiltonians consisting of Pauli observables is an essential task in modern quantum computing. We introduce a Resource-Optimized Grouping Shadow (ROGS) algorithm, which optimally allocates measurement resources by minimizing the estimation error bound through a novel overlapped grouping strategy and convex optimization. Our numerical experiments demonstrate that ROGS requires significantly fewer unique quantum circuits for accurate estimation accuracy compared to existing methods given a fixed measurement budget, addressing a major cost factor for compiling and executing circuits on quantum computers. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 22 pages, 5 figures

arXiv:2406.17218 [pdf, ps, other]

MIMO-OFDM ISAC Waveform Design for Range-Doppler Sidelobe Suppression

Authors: Peishi Li, Ming Li, Rang Liu, Qian Liu, A. Lee Swindlehurst

Abstract: Integrated sensing and communication (ISAC) is a key enabling technique for future wireless networks owing to its efficient hardware and spectrum utilization. In this paper, we focus on dual-functional waveform design for a multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) ISAC system, which is considered to be a promising solution for practical deployment. Since th… ▽ More Integrated sensing and communication (ISAC) is a key enabling technique for future wireless networks owing to its efficient hardware and spectrum utilization. In this paper, we focus on dual-functional waveform design for a multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) ISAC system, which is considered to be a promising solution for practical deployment. Since the dual-functional waveform carries communication information, its random nature leads to high range-Doppler sidelobes in the ambiguity function, which in turn degrades radar sensing performance. To suppress range-Doppler sidelobes, we propose a novel symbol-level precoding (SLP) based waveform design for MIMO-OFDM ISAC systems by fully exploiting the temporal degrees of freedom (DoFs). Our goal is to minimize the range-Doppler integrated sidelobe level (ISL) while satisfying the constraints of target illumination power, multi-user communication quality of service (QoS), and constant-modulus transmission. To solve the resulting non-convex waveform design problem, we develop an efficient algorithm using the majorization-minimization (MM) and alternative direction method of multipliers (ADMM) methods. Simulation results show that the proposed waveform has significantly reduced range-Doppler sidelobes compared with signals designed only for communications and other baselines. In addition, the proposed waveform design achieves target detection and estimation performance close to that achievable by waveforms designed only for radar, which demonstrates the superiority of the proposed SLP-based ISAC approach. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 13 pages, 9 figures, submitted to IEEE TWC

arXiv:2406.16982 [pdf]

Research on Disease Prediction Model Construction Based on Computer AI deep Learning Technology

Authors: Yang Lin, Muqing Li, Ziyi Zhu, Yinqiu Feng, Lingxi Xiao, Zexi Chen

Abstract: The prediction of disease risk factors can screen vulnerable groups for effective prevention and treatment, so as to reduce their morbidity and mortality. Machine learning has a great demand for high-quality labeling information, and labeling noise in medical big data poses a great challenge to efficient disease risk warning methods. Therefore, this project intends to study the robust learning alg… ▽ More The prediction of disease risk factors can screen vulnerable groups for effective prevention and treatment, so as to reduce their morbidity and mortality. Machine learning has a great demand for high-quality labeling information, and labeling noise in medical big data poses a great challenge to efficient disease risk warning methods. Therefore, this project intends to study the robust learning algorithm and apply it to the early warning of infectious disease risk. A dynamic truncated loss model is proposed, which combines the traditional mutual entropy implicit weight feature with the mean variation feature. It is robust to label noise. A lower bound on training loss is constructed, and a method based on sampling rate is proposed to reduce the gradient of suspected samples to reduce the influence of noise on training results. The effectiveness of this method under different types of noise was verified by using a stroke screening data set as an example. This method enables robust learning of data containing label noise. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.16710 [pdf, other]

Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image

Authors: Jinkun Hao, Junshu Tang, Jiangning Zhang, Ran Yi, Yijia Hong, Moran Li, Weijian Cao, Yating Wang, Lizhuang Ma

Abstract: While recent works have achieved great success on one-shot 3D common object generation, high quality and fidelity 3D head generation from a single image remains a great challenge. Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry. To handle this challenging problem, we propose a novel framew… ▽ More While recent works have achieved great success on one-shot 3D common object generation, high quality and fidelity 3D head generation from a single image remains a great challenge. Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry. To handle this challenging problem, we propose a novel framework, Portrait3D, to generate high-quality 3D heads while preserving their identities. Our work incorporates the identity information of the portrait image into three parts: 1) geometry initialization, 2) geometry sculpting, and 3) texture generation stages. Given a reference portrait image, we first align the identity features with text features to realize ID-aware guidance enhancement, which contains the control signals representing the face information. We then use the canny map, ID features of the portrait image, and a pre-trained text-to-normal/depth diffusion model to generate ID-aware geometry supervision, and 3D-GAN inversion is employed to generate ID-aware geometry initialization. Furthermore, with the ability to inject identity information into 3D head generation, we use ID-aware guidance to calculate ID-aware Score Distillation (ISD) for geometry sculpting. For texture generation, we adopt the ID Consistent Texture Inpainting and Refinement which progressively expands the view for texture inpainting to obtain an initialization UV texture map. We then use the id-aware guidance to provide image-level supervision for noisy multi-view images to obtain a refined texture map. Extensive experiments demonstrate that we can generate high-quality 3D heads with accurate geometry and texture from single in-the-wild portrait images. The project page is at https://jinkun-hao.github.io/Portrait3D/. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: https://jinkun-hao.github.io/Portrait3D/

arXiv:2406.16654 [pdf, other]

Ensemble-Embedding Graph Neural Network for Direct Prediction of Optical Spectra from Crystal Structure

Authors: Nguyen Tuan Hung, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Mingda Li

Abstract: Optical properties in solids, such as refractive index and absorption, hold vast applications ranging from solar panels to sensors, photodetectors, and transparent displays. However, first-principles computation of optical properties from crystal structures is a complex task due to the high convergence criteria and computational cost. Recent progress in machine learning shows promise in predicting… ▽ More Optical properties in solids, such as refractive index and absorption, hold vast applications ranging from solar panels to sensors, photodetectors, and transparent displays. However, first-principles computation of optical properties from crystal structures is a complex task due to the high convergence criteria and computational cost. Recent progress in machine learning shows promise in predicting material properties, yet predicting optical properties from crystal structures remains challenging due to the lack of efficient atomic embeddings. Here, we introduce GNNOpt, an equivariance graph-neural-network architecture featuring automatic embedding optimization. This enables high-quality optical predictions with a dataset of only 944 materials. GNNOpt predicts all optical properties based on the Kramers-Kr{ö}nig relations, including absorption coefficient, complex dielectric function, complex refractive index, and reflectance. We apply the trained model to screen photovoltaic materials based on spectroscopic limited maximum efficiency and search for quantum materials based on quantum weight. First-principles calculations validate the efficacy of the GNNOpt model, demonstrating excellent agreement in predicting the optical spectra of unseen materials. The discovery of new quantum materials with high predicted quantum weight, such as SiOs which hosts exotic quasiparticles, demonstrates GNNOpt's potential in predicting optical properties across a broad range of materials and applications. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: (i) Completely Rewritten Manuscript, including 5 main figures and 1 table. (ii) Supplementary Information, including 15 supplementary figures and 2 tables

arXiv:2406.16604 [pdf, ps, other]

Volume of algebraically integrable foliations and locally stable families

Authors: Jingjun Han, Junpeng Jiao, Mengchu Li, Jihao Liu

Abstract: In this paper, we study the volume of algebraically integrable foliations and locally stable families. We show that, for any canonical algebraically integrable foliation, its volume belongs to a discrete set depending only on its rank and the volume of its general leaves. In particular, if the foliation is of general type, then its volume has a positive lower bound depending only on its rank and t… ▽ More In this paper, we study the volume of algebraically integrable foliations and locally stable families. We show that, for any canonical algebraically integrable foliation, its volume belongs to a discrete set depending only on its rank and the volume of its general leaves. In particular, if the foliation is of general type, then its volume has a positive lower bound depending only on its rank and the volume of its general leaves. This implies some special cases of a question posed by Cascini, Hacon, and Langer. As a consequence, we show that the relative volume of a stable family with a normal generic fiber belongs to a discrete set if the dimension and the volume of its general fibers are bounded. Log versions of the aforementioned theorems are also provided and proved. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 24 pages

MSC Class: 14E30; 37F75

arXiv:2406.16358 [pdf, other]

Approximate DCT and Quantization Techniques for Energy-Constrained Image Sensors

Authors: Ming-Che Li, Archisman Ghosh, Shreyas Sen

Abstract: Recent expansions in multimedia devices gather enormous amounts of real-time images for processing and inference. The images are first compressed using compression schemes, like JPEG, to reduce storage costs and power for transmitting the captured data. Due to inherent error resilience and imperceptibility in images, JPEG can be approximated to reduce the required computation power and area. This… ▽ More Recent expansions in multimedia devices gather enormous amounts of real-time images for processing and inference. The images are first compressed using compression schemes, like JPEG, to reduce storage costs and power for transmitting the captured data. Due to inherent error resilience and imperceptibility in images, JPEG can be approximated to reduce the required computation power and area. This work demonstrates the first end-to-end approximation computing-based optimization of JPEG hardware using i) an approximate division realized using bit-shift operators to reduce the complexity of the quantization block, ii) loop perforation, and iii) precision scaling on top of a multiplier-less fast DCT architecture to achieve an extremely energy-efficient JPEG compression unit which will be a perfect fit for power/bandwidth-limited scenario. Furthermore, a gradient descent-based heuristic composed of two conventional approximation strategies, i.e., Precision Scaling and Loop Perforation, is implemented for tuning the degree of approximation to trade off energy consumption with the quality degradation of the decoded image. The entire RTL design is coded in Verilog HDL, synthesized, mapped to TSMC 65nm CMOS technology, and simulated using Cadence Spectre Simulator under 25$^{\circ}$\textbf{C}, TT corner. The approximate division approach achieved around $\textbf{28\%}$ reduction in the active design area. The heuristic-based approximation technique combined with accelerator optimization achieves a significant energy reduction of $\textbf{36\%}$ for a minimal image quality degradation of $\textbf{2\%}$ SAD. Simulation results also show that the proposed architecture consumes 15uW at the DCT and quantization stages to compress a colored 480p image at 6fps. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16289 [pdf, other]

doi 10.1109/TITS.2024.3415394

Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction

Authors: Tong Qin, Changze Li, Haoyang Ye, Shaowei Wan, Minzhen Li, Hongwei Liu, Ming Yang

Abstract: Recently, Neural Radiance Fields (NeRF) achieved impressive results in novel view synthesis. Block-NeRF showed the capability of leveraging NeRF to build large city-scale models. For large-scale modeling, a mass of image data is necessary. Collecting images from specially designed data-collection vehicles can not support large-scale applications. How to acquire massive high-quality data remains an… ▽ More Recently, Neural Radiance Fields (NeRF) achieved impressive results in novel view synthesis. Block-NeRF showed the capability of leveraging NeRF to build large city-scale models. For large-scale modeling, a mass of image data is necessary. Collecting images from specially designed data-collection vehicles can not support large-scale applications. How to acquire massive high-quality data remains an opening problem. Noting that the automotive industry has a huge amount of image data, crowd-sourcing is a convenient way for large-scale data collection. In this paper, we present a crowd-sourced framework, which utilizes substantial data captured by production vehicles to reconstruct the scene with the NeRF model. This approach solves the key problem of large-scale reconstruction, that is where the data comes from and how to use them. Firstly, the crowd-sourced massive data is filtered to remove redundancy and keep a balanced distribution in terms of time and space. Then a structure-from-motion module is performed to refine camera poses. Finally, images, as well as poses, are used to train the NeRF model in a certain block. We highlight that we present a comprehensive framework that integrates multiple modules, including data selection, sparse 3D reconstruction, sequence appearance embedding, depth supervision of ground surface, and occlusion completion. The complete system is capable of effectively processing and reconstructing high-quality 3D scenes from crowd-sourced data. Extensive quantitative and qualitative experiments were conducted to validate the performance of our system. Moreover, we proposed an application, named first-view navigation, which leveraged the NeRF model to generate 3D street view and guide the driver with a synthesized video. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.16272 [pdf, other]

Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement

Authors: Zhiyuan Chang, Mingyang Li, Junjie Wang, Yi Liu, Qing Wang, Yang Liu

Abstract: Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by… ▽ More Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by T2I DMs miss key objects mentioned in the prompt. We first conduct an empirical study on this issue, exploring the prevalence of catastrophic-neglect, potential mitigation strategies with feature enhancement, and the insights gained. Guided by the empirical findings, we propose an automated repair approach named Patcher to address catastrophic-neglect in T2I DMs. Specifically, Patcher first determines whether there are any neglected objects in the prompt, and then applies attention-guided feature enhancement to these neglected objects, resulting in a repaired prompt. Experimental results on three versions of Stable Diffusion demonstrate that Patcher effectively repairs the issue of catastrophic-neglect, achieving 10.1%-16.3% higher Correct Rate in image generation compared to baselines. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 11 pages, 3 figures

arXiv:2406.16271 [pdf, other]

Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation

Authors: Xueyu Liu, Guangze Shi, Rui Wang, Yexin Lai, Jianan Zhang, Lele Sun, Quan Yang, Yongfei Wu, MIng Li, Weixia Han, Wen Zheng

Abstract: Assessment of the glomerular basement membrane (GBM) in transmission electron microscopy (TEM) is crucial for diagnosing chronic kidney disease (CKD). The lack of domain-independent automatic segmentation tools for the GBM necessitates an AI-based solution to automate the process. In this study, we introduce GBMSeg, a training-free framework designed to automatically segment the GBM in TEM images… ▽ More Assessment of the glomerular basement membrane (GBM) in transmission electron microscopy (TEM) is crucial for diagnosing chronic kidney disease (CKD). The lack of domain-independent automatic segmentation tools for the GBM necessitates an AI-based solution to automate the process. In this study, we introduce GBMSeg, a training-free framework designed to automatically segment the GBM in TEM images guided only by a one-shot annotated reference. Specifically, GBMSeg first exploits the robust feature matching capabilities of the pretrained foundation model to generate initial prompt points, then introduces a series of novel automatic prompt engineering techniques across the feature and physical space to optimize the prompt scheme. Finally, GBMSeg employs a class-agnostic foundation segmentation model with the generated prompt scheme to obtain accurate segmentation results. Experimental results on our collected 2538 TEM images confirm that GBMSeg achieves superior segmentation performance with a Dice similarity coefficient (DSC) of 87.27% using only one labeled reference image in a training-free manner, outperforming recently proposed one-shot or few-shot methods. In summary, GBMSeg introduces a distinctive automatic prompt framework that facilitates robust domain-independent segmentation performance without training, particularly advancing the automatic prompting of foundation segmentation models for medical images. Future work involves automating the thickness measurement of segmented GBM and quantifying pathological indicators, holding significant potential for advancing pathology assessments in clinical applications. The source code is available on https://github.com/SnowRain510/GBMSeg △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: Accepted for MICCAI2024

arXiv:2406.16116 [pdf, ps, other]

A First Running Time Analysis of the Strength Pareto Evolutionary Algorithm 2 (SPEA2)

Authors: Shengjie Ren, Chao Bian, Miqing Li, Chao Qian

Abstract: Evolutionary algorithms (EAs) have emerged as a predominant approach for addressing multi-objective optimization problems. However, the theoretical foundation of multi-objective EAs (MOEAs), particularly the fundamental aspects like running time analysis, remains largely underexplored. Existing theoretical studies mainly focus on basic MOEAs, with little attention given to practical MOEAs. In this… ▽ More Evolutionary algorithms (EAs) have emerged as a predominant approach for addressing multi-objective optimization problems. However, the theoretical foundation of multi-objective EAs (MOEAs), particularly the fundamental aspects like running time analysis, remains largely underexplored. Existing theoretical studies mainly focus on basic MOEAs, with little attention given to practical MOEAs. In this paper, we present a running time analysis of strength Pareto evolutionary algorithm 2 (SPEA2) for the first time. Specifically, we prove that the expected running time of SPEA2 for solving three commonly used multi-objective problems, i.e., $m$OneMinMax, $m$LeadingOnesTrailingZeroes, and $m$-OneJumpZeroJump, is $O(μn\cdot \min\{m\log n, n\})$, $O(μn^2)$, and $O(μn^k \cdot \min\{mn, 3^{m/2}\})$, respectively. Here $m$ denotes the number of objectives, and the population size $μ$ is required to be at least $(2n/m+1)^{m/2}$, $(2n/m+1)^{m-1}$ and $(2n/m-2k+3)^{m/2}$, respectively. The proofs are accomplished through general theorems which are also applicable for analyzing the expected running time of other MOEAs on these problems, and thus can be helpful for future theoretical analysis of MOEAs. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15938 [pdf, other]

RuleR: Improving LLM Controllability by Rule-based Data Recycling

Authors: Ming Li, Han Chen, Chenguang Wang, Dang Nguyen, Dianqi Li, Tianyi Zhou

Abstract: Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR),… ▽ More Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. Instead of creating new data from scratch, RuleR ``recycles'' existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions. Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities. The code will be released on https://github.com/MingLiiii/RuleR. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.15769 [pdf, other]

Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers

Authors: Qin Hua, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue, Minglu Li

Abstract: An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two signif… ▽ More An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two significant challenges in characterizing performance patterns for large-scale microservices. Firstly, diverse microservices demonstrate varying sensitivities to heterogeneous machines, causing difficulty in quantifying the performance difference in a fixed manner. Secondly, frequent version upgrades of microservices result in uncertain changes in performance patterns, known as pattern drifts, leading to imprecise resource capacity estimation issues. To address these challenges, we propose Humas, a heterogeneity- and upgrade-aware auto-scaling framework for large-scale microservices. Firstly, Humas quantifies the difference in resource efficiency among heterogeneous machines for various microservices online and normalizes their resources in standard units. Additionally, Humas develops a least squares density-difference (LSDD) based algorithm to identify pattern drifts caused by upgrades. Lastly, Humas generates capacity adjustment plans for microservices based on the latest performance patterns and predicted workloads. The experiment results conducted on 50 real microservices with over 11,000 containers demonstrate that Humas improves resource efficiency and performance stability by approximately 30.4% and 48.0%, respectively, compared to state-of-the-art approaches. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 14 pages; 27 figures

arXiv:2406.15305 [pdf, other]

PID: Prompt-Independent Data Protection Against Latent Diffusion Models

Authors: Ang Li, Yichuan Mo, Mingjie Li, Yisen Wang

Abstract: The few-shot fine-tuning of Latent Diffusion Models (LDMs) has enabled them to grasp new concepts from a limited number of images. However, given the vast amount of personal images accessible online, this capability raises critical concerns about civil privacy. While several previous defense methods have been developed to prevent such misuse of LDMs, they typically assume that the textual prompts… ▽ More The few-shot fine-tuning of Latent Diffusion Models (LDMs) has enabled them to grasp new concepts from a limited number of images. However, given the vast amount of personal images accessible online, this capability raises critical concerns about civil privacy. While several previous defense methods have been developed to prevent such misuse of LDMs, they typically assume that the textual prompts used by data protectors exactly match those employed by data exploiters. In this paper, we first empirically demonstrate that breaking this assumption, i.e., in cases where discrepancies exist between the textual conditions used by protectors and exploiters, could substantially reduce the effectiveness of these defenses. Furthermore, considering the visual encoder's independence from textual prompts, we delve into the visual encoder and thoroughly investigate how manipulating the visual encoder affects the few-shot fine-tuning process of LDMs. Drawing on these insights, we propose a simple yet effective method called \textbf{Prompt-Independent Defense (PID)} to safeguard privacy against LDMs. We show that PID can act as a strong privacy shield on its own while requiring significantly less computational power. We believe our studies, along with the comprehensive understanding and new defense method, provide a notable advance toward reliable data protection against LDMs. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 27 pages, ICML 2024 poster

arXiv:2406.15209 [pdf, other]

Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding

Authors: Mohan Li, Simon Keizer, Rama Doddipatla

Abstract: Zero-shot spoken language understanding (SLU) enables systems to comprehend user utterances in new domains without prior exposure to training data. Recent studies often rely on large language models (LLMs), leading to excessive footprints and complexity. This paper proposes the use of Whisper, a standalone speech processing model, for zero-shot end-to-end (E2E) SLU. To handle unseen semantic label… ▽ More Zero-shot spoken language understanding (SLU) enables systems to comprehend user utterances in new domains without prior exposure to training data. Recent studies often rely on large language models (LLMs), leading to excessive footprints and complexity. This paper proposes the use of Whisper, a standalone speech processing model, for zero-shot end-to-end (E2E) SLU. To handle unseen semantic labels, SLU tasks are integrated into a question-answering (QA) framework, which prompts the Whisper decoder for semantics deduction. The system is efficiently trained with prefix-tuning, optimising a minimal set of parameters rather than the entire Whisper model. We show that the proposed system achieves a 40.7% absolute gain for slot filling (SLU-F1) on SLURP compared to a recently introduced zero-shot benchmark. Furthermore, it performs comparably to a Whisper-GPT-2 modular system under both in-corpus and cross-corpus evaluation settings, but with a relative 34.8% reduction in model parameters. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Accepted to Interspeech 2024

arXiv:2406.15138 [pdf, other]

On the equivalence of Noether charge and Hilbert action boundary term formulae for the black hole entropy in F(Riemann) gravity theory

Authors: Wei Guo, Xiyao Guo, Mingfeng Li, Zili Mou, Hongbao Zhang

Abstract: By working with the covariant phase space formalism, we have shown that not only can the Hamiltonian conjugate to a Killing vector field ξ be expressed as the sum of the associated Noether charge and ξ contracted with the Hilbert action boundary term for F(Riemann) gravity, but also be written as its contraction with another ξ independent tensor field. With this, we have proven the equivalence of… ▽ More By working with the covariant phase space formalism, we have shown that not only can the Hamiltonian conjugate to a Killing vector field ξ be expressed as the sum of the associated Noether charge and ξ contracted with the Hilbert action boundary term for F(Riemann) gravity, but also be written as its contraction with another ξ independent tensor field. With this, we have proven the equivalence of Noether charge and Hilbert action boundary term formulae for the stationary black hole entropy in F(Riemann) gravity, which is further substantiated by our explicit computation using both formulae. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: PRD style, 7 pages, 1 figure

arXiv:2406.15030 [pdf, ps, other]

Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction… ▽ More Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 11 pages, 3 figures

arXiv:2406.14777 [pdf, other]

Learning to Cover: Online Learning and Optimization with Irreversible Decisions

Authors: Alexandre Jacquillat, Michael Lingzhi Li

Abstract: We define an online learning and optimization problem with irreversible decisions contributing toward a coverage target. At each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a machine learning model to guide future decisions. The goal is to minimize costs across a finite horizon under a chance constraint reflecting the coverage t… ▽ More We define an online learning and optimization problem with irreversible decisions contributing toward a coverage target. At each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a machine learning model to guide future decisions. The goal is to minimize costs across a finite horizon under a chance constraint reflecting the coverage target. We derive an optimal algorithm and a tight lower bound in an asymptotic regime characterized by a large target number of facilities $m\to\infty$ but a finite horizon $T\in\mathbb{Z}_+$. We find that the regret grows sub-linearly at a rate $Θ\left(m^{\frac{1}{2}\cdot\frac{1}{1-2^{-T}}}\right)$, thus converging exponentially fast to $Θ(\sqrt{m})$. We establish the robustness of this result to the learning environment; we also extend it to a more complicated facility location setting in a bipartite facility-customer graph with a target on customer coverage. Throughout, constructive proofs identify a policy featuring limited exploration initially for learning purposes, and fast exploitation later on for optimization purposes once uncertainty gets mitigated. These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14482 [pdf, other]

Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

Authors: Xinyi Ying, Chao Xiao, Ruojing Li, Xu He, Boyang Li, Zhaoxu Li, Yingqian Wang, Mingyuan Hu, Qingyu Xu, Zaiping Lin, Miao Li, Shilin Zhou, Wei An, Weidong Sheng, Li Liu

Abstract: Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large t… ▽ More Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large target size cannot provide an impartial benchmark to evaluate multi-category visible-thermal small object detection (RGBT SOD) algorithms. In this paper, we build the first large-scale benchmark with high diversity for RGBT SOD (namely RGBT-Tiny), including 115 paired sequences, 93K frames and 1.2M manual annotations. RGBT-Tiny contains abundant targets (7 categories) and high-diversity scenes (8 types that cover different illumination and density variations). Note that, over 81% of targets are smaller than 16x16, and we provide paired bounding box annotations with tracking ID to offer an extremely challenging benchmark with wide-range applications, such as RGBT fusion, detection and tracking. In addition, we propose a scale adaptive fitness (SAFit) measure that exhibits high robustness on both small and large targets. The proposed SAFit can provide reasonable performance evaluation and promote detection performance. Based on the proposed RGBT-Tiny dataset and SAFit measure, extensive evaluations have been conducted, including 23 recent state-of-the-art algorithms that cover four different types (i.e., visible generic detection, visible SOD, thermal SOD and RGBT object detection). Project is available at https://github.com/XinyiYing24/RGBT-Tiny. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14422 [pdf, other]

FutureNet-LOF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding

Authors: Mingkun Wang, Xiaoguang Ren, Ruochun Jin, Minglong Li, Xiaochuan Zhang, Changqian Yu, Mingxu Wang, Wenjing Yang

Abstract: Most prior motion prediction endeavors in autonomous driving have inadequately encoded future scenarios, leading to predictions that may fail to accurately capture the diverse movements of agents (e.g., vehicles or pedestrians). To address this, we propose FutureNet, which explicitly integrates initially predicted trajectories into the future scenario and further encodes these future contexts to e… ▽ More Most prior motion prediction endeavors in autonomous driving have inadequately encoded future scenarios, leading to predictions that may fail to accurately capture the diverse movements of agents (e.g., vehicles or pedestrians). To address this, we propose FutureNet, which explicitly integrates initially predicted trajectories into the future scenario and further encodes these future contexts to enhance subsequent forecasting. Additionally, most previous motion forecasting works have focused on predicting independent futures for each agent. However, safe and smooth autonomous driving requires accurately predicting the diverse future behaviors of numerous surrounding agents jointly in complex dynamic environments. Given that all agents occupy certain potential travel spaces and possess lane driving priority, we propose Lane Occupancy Field (LOF), a new representation with lane semantics for motion forecasting in autonomous driving. LOF can simultaneously capture the joint probability distribution of all road participants' future spatial-temporal positions. Due to the high compatibility between lane occupancy field prediction and trajectory prediction, we propose a novel network with future context encoding for the joint prediction of these two tasks. Our approach ranks 1st on two large-scale motion forecasting benchmarks: Argoverse 1 and Argoverse 2. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 10 pages

arXiv:2406.14180 [pdf, other]

RTFormer: Re-parameter TSBN Spiking Transformer

Authors: Hongzhi Wang, Xiubo Liang, Mengjian Li, Tao Zhang

Abstract: The Spiking Neural Networks (SNNs), renowned for their bio-inspired operational mechanism and energy efficiency, mirror the human brain's neural activity. Yet, SNNs face challenges in balancing energy efficiency with the computational demands of advanced tasks. Our research introduces the RTFormer, a novel architecture that embeds Re-parameterized Temporal Sliding Batch Normalization (TSBN) within… ▽ More The Spiking Neural Networks (SNNs), renowned for their bio-inspired operational mechanism and energy efficiency, mirror the human brain's neural activity. Yet, SNNs face challenges in balancing energy efficiency with the computational demands of advanced tasks. Our research introduces the RTFormer, a novel architecture that embeds Re-parameterized Temporal Sliding Batch Normalization (TSBN) within the Spiking Transformer framework. This innovation optimizes energy usage during inference while ensuring robust computational performance. The crux of RTFormer lies in its integration of reparameterized convolutions and TSBN, achieving an equilibrium between computational prowess and energy conservation. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14171 [pdf, other]

Ranking LLMs by compression

Authors: Peijia Guo, Ziguang Li, Haibo Hu, Chao Huang, Ming Li, Rui Zhang

Abstract: We conceptualize the process of understanding as information compression, and propose a method for ranking large language models (LLMs) based on lossless data compression. We demonstrate the equivalence of compression length under arithmetic coding with cumulative negative log probabilities when using a large language model as a prior, that is, the pre-training phase of the model is essentially th… ▽ More We conceptualize the process of understanding as information compression, and propose a method for ranking large language models (LLMs) based on lossless data compression. We demonstrate the equivalence of compression length under arithmetic coding with cumulative negative log probabilities when using a large language model as a prior, that is, the pre-training phase of the model is essentially the process of learning the optimal coding length. At the same time, the evaluation metric compression ratio can be obtained without actual compression, which greatly saves overhead. In this paper, we use five large language models as priors for compression, then compare their performance on challenging natural language processing tasks, including sentence completion, question answering, and coreference resolution. Experimental results show that compression ratio and model performance are positively correlated, so it can be used as a general metric to evaluate large language models. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 7 pages, 4 tables

arXiv:2406.14076 [pdf, ps, other]

The limits of Kahler manifolds under holomorphic deformations

Authors: Mu-Lin Li, Wanmin Liu

Abstract: With some mild assumptions on metric and topology of the central fiber, we prove that the limit of Kahler manifolds under holomorphic deformation is still Kahler. With some mild assumptions on metric and topology of the central fiber, we prove that the limit of Kahler manifolds under holomorphic deformation is still Kahler. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13940 [pdf, other]

AutoCAP: Towards Automatic Cross-lingual Alignment Planning for Zero-shot Chain-of-Thought

Authors: Yongheng Zhang, Qiguang Chen, Min Li, Wanxiang Che, Libo Qin

Abstract: Cross-lingual chain-of-thought can effectively complete reasoning tasks across languages, which gains increasing attention. Recently, dominant approaches in the literature improve cross-lingual alignment capabilities by integrating reasoning knowledge from different languages. Despite achieving excellent performance, current methods still have two main challenges: (1) Manual language specification… ▽ More Cross-lingual chain-of-thought can effectively complete reasoning tasks across languages, which gains increasing attention. Recently, dominant approaches in the literature improve cross-lingual alignment capabilities by integrating reasoning knowledge from different languages. Despite achieving excellent performance, current methods still have two main challenges: (1) Manual language specification: They still highly rely on manually selecting the languages to integrate, severely affecting their generalizability; (2) Static weight allocation: Current methods simply integrate all languages equally. In fact, different language reasoning paths should have different weights to achieve better complementation and integration. Motivated by this, we introduce an Automatic Cross-lingual Alignment Planning (AutoCAP) for zero-shot chain-of-thought to address the above challenges. The core of AutoCAP consists of two components: (1) Automatic Language Selection Prompting to guide LLMs to select appropriate languages and (2) Automatic Weight Allocation Prompting to automatically allocate alignment weight scores to each reasoning path. Extensive experiments on several benchmarks reveal that AutoCAP achieves state-of-the-art performance, surpassing previous methods that required manual effort. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepted by ACL2024 Findings

arXiv:2406.13778 [pdf, other]

Benchmarking Unsupervised Online IDS for Masquerade Attacks in CAN

Authors: Pablo Moriano, Steven C. Hespeler, Mingyan Li, Robert A. Bridges

Abstract: Vehicular controller area networks (CANs) are susceptible to masquerade attacks by malicious adversaries. In masquerade attacks, adversaries silence a targeted ID and then send malicious frames with forged content at the expected timing of benign frames. As masquerade attacks could seriously harm vehicle functionality and are the stealthiest attacks to detect in CAN, recent work has devoted attent… ▽ More Vehicular controller area networks (CANs) are susceptible to masquerade attacks by malicious adversaries. In masquerade attacks, adversaries silence a targeted ID and then send malicious frames with forged content at the expected timing of benign frames. As masquerade attacks could seriously harm vehicle functionality and are the stealthiest attacks to detect in CAN, recent work has devoted attention to compare frameworks for detecting masquerade attacks in CAN. However, most existing works report offline evaluations using CAN logs already collected using simulations that do not comply with domain's real-time constraints. Here we contribute to advance the state of the art by introducing a benchmark study of four different non-deep learning (DL)-based unsupervised online intrusion detection systems (IDS) for masquerade attacks in CAN. Our approach differs from existing benchmarks in that we analyze the effect of controlling streaming data conditions in a sliding window setting. In doing so, we use realistic masquerade attacks being replayed from the ROAD dataset. We show that although benchmarked IDS are not effective at detecting every attack type, the method that relies on detecting changes at the hierarchical structure of clusters of time series produces the best results at the expense of higher computational overhead. We discuss limitations, open challenges, and how the benchmarked methods can be used for practical unsupervised online CAN IDS for masquerade attacks. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 15 pages, 9 figures, 3 tables

arXiv:2406.13555 [pdf, other]

BiLD: Bi-directional Logits Difference Loss for Large Language Model Distillation

Authors: Minchong Li, Feng Zhou, Xiaohui Song

Abstract: In recent years, large language models (LLMs) have shown exceptional capabilities across various natural language processing (NLP) tasks. However, such impressive performance often comes with the trade-off of an increased parameter size, posing significant challenges for widespread deployment. Knowledge distillation (KD) provides a solution by transferring knowledge from a large teacher model to a… ▽ More In recent years, large language models (LLMs) have shown exceptional capabilities across various natural language processing (NLP) tasks. However, such impressive performance often comes with the trade-off of an increased parameter size, posing significant challenges for widespread deployment. Knowledge distillation (KD) provides a solution by transferring knowledge from a large teacher model to a smaller student model. In this paper, we explore the task-specific distillation of LLMs at the logit level. Our investigation reveals that the logits of fine-tuned LLMs exhibit a more extreme long-tail distribution than those from vision models, with hidden "noise" in the long tail affecting distillation performance. Furthermore, existing logits distillation methods often struggle to effectively utilize the internal ranking information from the logits. To address these, we propose the Bi-directional Logits Difference (BiLD) loss. The BiLD loss filters out the long-tail noise by utilizing only top-$k$ teacher and student logits, and leverages the internal logits ranking information by constructing logits differences. To evaluate BiLD loss, we conduct comprehensive experiments on 13 datasets using two types of LLMs. Our results show that the BiLD loss, with only the top-8 logits, outperforms supervised fine-tuning (SFT), vanilla KL loss, and five other distillation methods from both NLP and CV fields. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Submitted to ARR June (for EMNLP 2024)

arXiv:2406.13443 [pdf, other]

Dual-Phase Accelerated Prompt Optimization

Authors: Muchen Yang, Moxin Li, Yongle Li, Zijun Chen, Chongming Gao, Junqi Zhang, Yangyang Li, Fuli Feng

Abstract: Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Models (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfa… ▽ More Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Models (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfactory performance. In this light, we aim to accelerate prompt optimization process to tackle the challenge of low convergence rate. We propose a dual-phase approach which starts with generating high-quality initial prompts by adopting a well-designed meta-instruction to delve into task-specific information, and iteratively optimize the prompts at the sentence level, leveraging previous tuning experience to expand prompt candidates and accept effective ones. Extensive experiments on eight datasets demonstrate the effectiveness of our proposed method, achieving a consistent accuracy gain over baselines with less than five optimization steps. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Showing 1–50 of 4,625 results for author: Li, M