Search | arXiv e-print repository

Search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, V. Batozskaya, D. Becker, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, J. Bloms, A. Bortone, I. Boyko , et al. (559 additional authors not shown)

Abstract: We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for… ▽ More We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ are set to be $1.1 \times 10^{-5}$ and $4.3 \times 10^{-6}$ at 90\% confidence level, respectively. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 14 pages, 7 figures

arXiv:2405.08320 [pdf, ps, other]

Strain-induced long-range charge-density wave order in the optimally doped Bi$_2$Sr$_{2-x}$La$_x$CuO$_{6}$ superconductor

Authors: Shinji Kawasaki, Nao Tsukuda, Chengtian Lin, Guo-qing Zheng

Abstract: The mechanism of high-temperature superconductivity in copper oxides (cuprate) remains elusive, with the pseudogap phase considered a potential factor. Recent attention has focused on a long-range symmetry-broken charge-density wave (CDW) order in the underdoped regime, induced by strong magnetic fields. Here by $^{63,65}$Cu-nuclear magnetic resonance, we report the discovery of a long-range CDW o… ▽ More The mechanism of high-temperature superconductivity in copper oxides (cuprate) remains elusive, with the pseudogap phase considered a potential factor. Recent attention has focused on a long-range symmetry-broken charge-density wave (CDW) order in the underdoped regime, induced by strong magnetic fields. Here by $^{63,65}$Cu-nuclear magnetic resonance, we report the discovery of a long-range CDW order in the optimally doped Bi$_2$Sr$_{2-x}$La$_x$CuO$_6$ superconductor, induced by in-plane strain exceeding $|$$\varepsilon$$|$ = 0.15 %, which deliberately breaks the crystal symmetry of the CuO$_2$ plane. We find that compressive/tensile strains reduce superconductivity but enhance CDW, leaving superconductivity to coexist with CDW. The findings show that a long-range CDW order is an underlying hidden order in the pseudogap state, not limited to the underdoped regime, becoming apparent under strain. Our result sheds light on the intertwining of various orders in the cuprates. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 10 pages, 6 figures

Journal ref: Nature Communications 15, 5082 (2024)

arXiv:2405.08114 [pdf, other]

RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations

Authors: Chengde Lin, Xijun Lu, Guangxi Chen

Abstract: Synthesizing high-quality photorealistic images with textual descriptions as a condition is very challenging. Generative Adversarial Networks (GANs), the classical model for this task, frequently suffer from low consistency between image and text descriptions and insufficient richness in synthesized images. Recently, conditional affine transformations (CAT), such as conditional batch normalization… ▽ More Synthesizing high-quality photorealistic images with textual descriptions as a condition is very challenging. Generative Adversarial Networks (GANs), the classical model for this task, frequently suffer from low consistency between image and text descriptions and insufficient richness in synthesized images. Recently, conditional affine transformations (CAT), such as conditional batch normalization and instance normalization, have been applied to different layers of GAN to control content synthesis in images. CAT is a multi-layer perceptron that independently predicts data based on batch statistics between neighboring layers, with global textual information unavailable to other layers. To address this issue, we first model CAT and a recurrent neural network (RAT) to ensure that different layers can access global information. We then introduce shuffle attention between RAT to mitigate the characteristic of information forgetting in recurrent neural networks. Moreover, both our generator and discriminator utilize the powerful pre-trained model, Clip, which has been extensively employed for establishing associations between text and images through the learning of multimodal representations in latent space. The discriminator utilizes CLIP's ability to comprehend complex scenes to accurately assess the quality of the generated images. Extensive experiments have been conducted on the CUB, Oxford, and CelebA-tiny datasets to demonstrate the superiority of the proposed model over current state-of-the-art models. The code is https://github.com/OxygenLu/RATLIP. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07741 [pdf, other]

Search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (635 additional authors not shown)

Abstract: Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions… ▽ More Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions $\mathcal{B}(χ_{c1}(3872)\toγψ_2(3823), ψ_2(3823)\toγχ_{c1})/\mathcal{B}(χ_{c1}(3872)\toπ^+π^- J/ψ)$ is set as 0.075 at the 90\% confidence level. Our result contradicts theoretical predictions under the assumption that the $χ_{c1}(3872)$ is the pure charmonium state $χ_{c1}(2P)$. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 8 pages, 2 figures

arXiv:2405.07573 [pdf, other]

MaskFuser: Masked Fusion of Joint Multi-Modal Tokenization for End-to-End Autonomous Driving

Authors: Yiqun Duan, Xianda Guo, Zheng Zhu, Zhen Wang, Yu-Kai Wang, Chin-Teng Lin

Abstract: Current multi-modality driving frameworks normally fuse representation by utilizing attention between single-modality branches. However, the existing networks still suppress the driving performance as the Image and LiDAR branches are independent and lack a unified observation representation. Thus, this paper proposes MaskFuser, which tokenizes various modalities into a unified semantic feature spa… ▽ More Current multi-modality driving frameworks normally fuse representation by utilizing attention between single-modality branches. However, the existing networks still suppress the driving performance as the Image and LiDAR branches are independent and lack a unified observation representation. Thus, this paper proposes MaskFuser, which tokenizes various modalities into a unified semantic feature space and provides a joint representation for further behavior cloning in driving contexts. Given the unified token representation, MaskFuser is the first work to introduce cross-modality masked auto-encoder training. The masked training enhances the fusion representation by reconstruction on masked tokens. Architecturally, a hybrid-fusion network is proposed to combine advantages from both early and late fusion: For the early fusion stage, modalities are fused by performing monotonic-to-BEV translation attention between branches; Late fusion is performed by tokenizing various modalities into a unified token space with shared encoding on it. MaskFuser respectively reaches a driving score of 49.05 and route completion of 92.85% on the CARLA LongSet6 benchmark evaluation, which improves the best of previous baselines by 1.74 and 3.21%. The introduced masked fusion increases driving stability under damaged sensory inputs. MaskFuser outperforms the best of previous baselines on driving score by 6.55 (27.8%), 1.53 (13.8%), 1.57 (30.9%), respectively given sensory masking ratios 25%, 50%, and 75%. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.06556 [pdf, other]

Search for time-dependent $CP$ violation in $D^0 \rightarrow π^+ π^- π^0$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1062 additional authors not shown)

Abstract: A measurement of time-dependent $CP$ violation in $D^0 \rightarrow π^+ π^- π^0$ decays using a $pp$ collision data sample collected by the LHCb experiment in 2012 and from 2015 to 2018, corresponding to an integrated luminosity of 7.7$\,\mathrm{fb}^{-1}$, is presented. The initial flavour of each $D^0$ candidate is determined from the charge of the pion produced in the… ▽ More A measurement of time-dependent $CP$ violation in $D^0 \rightarrow π^+ π^- π^0$ decays using a $pp$ collision data sample collected by the LHCb experiment in 2012 and from 2015 to 2018, corresponding to an integrated luminosity of 7.7$\,\mathrm{fb}^{-1}$, is presented. The initial flavour of each $D^0$ candidate is determined from the charge of the pion produced in the $D^*(2010)^+ \rightarrow D^0 π^+$ decay. The decay $D^0 \rightarrow K^- π^+ π^0$ is used as a control channel to validate the measurement procedure. The gradient of the time-dependent $CP$ asymmetry, $ΔY$, in $D^0 \rightarrow π^+ π^- π^0$ decays is measured to be \begin{equation*} ΔY = (-1.3 \pm 6.3 \pm 2.4) \times 10^{-4}, \end{equation*} where the first uncertainty is statistical and the second is systematic, which is compatible with $CP$ conservation. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lhcbproject.web.cern.ch/Publications/p/LHCb-PAPER-2024-003.html (LHCb public pages)

Report number: LHCb-PAPER-2024-003, CERN-EP-2024-111

arXiv:2405.06393 [pdf, other]

Measurement of the ${e}^{+}{e}^{-}\to p \bar{p}π^{0}$ cross section at $\sqrt{s}=2.1000-3.0800$ GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the… ▽ More The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the $p\bar{p}π^0$ energy threshold, we can probe the threshold behavior for this reaction. However, no anomalous threshold enhancement is found in the cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.05945 [pdf, other]

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Authors: Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li

Abstract: Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified f… ▽ More Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified framework designed to transform noise into images, videos, multi-view 3D objects, and audio clips conditioned on text instructions. By tokenizing the latent spatial-temporal space and incorporating learnable placeholders such as [nextline] and [nextframe] tokens, Lumina-T2X seamlessly unifies the representations of different modalities across various spatial-temporal resolutions. This unified approach enables training within a single framework for different modalities and allows for flexible generation of multimodal data at any resolution, aspect ratio, and length during inference. Advanced techniques like RoPE, RMSNorm, and flow matching enhance the stability, flexibility, and scalability of Flag-DiT, enabling models of Lumina-T2X to scale up to 7 billion parameters and extend the context window to 128K tokens. This is particularly beneficial for creating ultra-high-definition images with our Lumina-T2I model and long 720p videos with our Lumina-T2V model. Remarkably, Lumina-T2I, powered by a 5-billion-parameter Flag-DiT, requires only 35% of the training computational costs of a 600-million-parameter naive DiT. Our further comprehensive analysis underscores Lumina-T2X's preliminary capability in resolution extrapolation, high-resolution editing, generating consistent 3D views, and synthesizing videos with seamless transitions. We expect that the open-sourcing of Lumina-T2X will further foster creativity, transparency, and diversity in the generative AI community. △ Less

Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Technical Report; Code at: https://github.com/Alpha-VLLM/Lumina-T2X

arXiv:2405.05765 [pdf, other]

On the mixing between flavor singlets in lattice gauge theories coupled to matter fields in multiple representations

Authors: Ed Bennett, Niccolò Forzano, Deog Ki Hong, Ho Hsiao, Jong-Wan Lee, C. -J. David Lin, Biagio Lucini, Maurizio Piai, Davide Vadacchino, Fabian Zierler

Abstract: We provide the first extensive, numerical study of the non-trivial problem of mixing between flavor-singlet composite states emerging in strongly coupled lattice field theories with matter field content consisting of fermions transforming in different representations of the gauge group. The theory of interest is the minimal candidate for a composite Higgs model that also accommodates a mechanism f… ▽ More We provide the first extensive, numerical study of the non-trivial problem of mixing between flavor-singlet composite states emerging in strongly coupled lattice field theories with matter field content consisting of fermions transforming in different representations of the gauge group. The theory of interest is the minimal candidate for a composite Higgs model that also accommodates a mechanism for top partial compositeness: the $Sp(4)$ gauge theory coupled to two (Dirac) fermions transforming as the fundamental and three as the two-index antisymmetric representation of the gauge group, respectively. We apply an admixture of APE and Wuppertal smearings, as well as the generalized eigenvalue problem approach, to two-point functions involving flavor-singlet mesons, for ensembles having time extent longer than the space extent. We demonstrate that, in the region of lattice parameter space accessible to this study, both masses and mixing angles can be measured effectively, despite the presence of (numerically noisy) contributions from disconnected diagrams. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 18 pages, 3 figures, 4 tables

Report number: CTPU-PTC-24-12, PNUTP-24/A03

arXiv:2405.04964 [pdf, other]

Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

Authors: Yi Xiao, Qiangqiang Yuan, Kui Jiang, Yuzeng Chen, Qiang Zhang, Chia-Wen Lin

Abstract: Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-sca… ▽ More Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Recognizing that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

arXiv:2405.04097 [pdf, other]

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Authors: Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

Abstract: The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has… ▽ More The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has been less explored. Malicious deepfakes could ultimately cause public and social problems. Can we humans correctly perceive the authenticity of the content of the videos we watch? The answer is obviously uncertain; therefore, this paper aims to evaluate the human ability to discern deepfake videos through a subjective study. We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models. To this end, we used gamification concepts to provide 110 participants (55 native English speakers and 55 non-native English speakers) with a webbased platform where they could access a series of 40 videos (20 real and 20 fake) to determine their authenticity. Each participant performed the experiment twice with the same 40 videos in different random orders. The videos are manually selected from the FakeAVCeleb dataset. We found that all AI models performed better than humans when evaluated on the same 40 videos. The study also reveals that while deception is not impossible, humans tend to overestimate their detection capabilities. Our experimental results may help benchmark human versus machine performance, advance forensics analysis, and enable adaptive countermeasures. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.02973 [pdf, other]

FairRelay: Fair and Cost-Efficient Peer-to-Peer Content Delivery through Payment Channel Networks

Authors: Jingyu Liu, Yingjie Xue, Zifan Peng, Chao Lin, Xinyi Huang

Abstract: Peer-to-Peer (P2P) content delivery, known for scalability and resilience, offers a decentralized alternative to traditional centralized Content Delivery Networks (CDNs). A significant challenge in P2P content delivery remains: the fair compensation of relayers for their bandwidth contributions. Existing solutions employ blockchains for payment settlements, however, they are not practical due to h… ▽ More Peer-to-Peer (P2P) content delivery, known for scalability and resilience, offers a decentralized alternative to traditional centralized Content Delivery Networks (CDNs). A significant challenge in P2P content delivery remains: the fair compensation of relayers for their bandwidth contributions. Existing solutions employ blockchains for payment settlements, however, they are not practical due to high on-chain costs and over-simplified network assumptions. In this paper, we introduce FairRelay, a fair and cost-efficient protocol that ensures all participants get fair payoff in complex content delivery network settings. We introduce a novel primitive, Enforceable Accumulative Hashed TimeLock Contract (Enforceable A-HTLC), designed to guarantee payment atomicity - ensuring all participants receive their payments upon successful content delivery. The fairness of FairRelay is proved using the Universal Composability (UC) framework. Our evaluation demonstrates that, in optimistic scenarios, FairRelay employs zero on-chain costs. In pessimistic scenarios, the on-chain dispute costs for relayers and customers are constant, irrespective of the network complexity. Specifically, empirical results indicate that the on-chain dispute costs for relayers and customers are 24,902 gas (equivalent to 0.01 USD on Optimism L2) and 290,797 gas (0.07 USD), respectively. In a 10-hop relay path, FairRelay introduces less than 1.5% additional overhead compared to pure data transmission, showcasing the efficiency of FairRelay. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 27 pages, 21 figures

arXiv:2405.02630 [pdf, other]

cuTN-QSVM: cuTensorNet-accelerated Quantum Support Vector Machine with cuQuantum SDK

Authors: Kuan-Cheng Chen, Tai-Yue Li, Yun-Yuan Wang, Simon See, Chun-Chieh Wang, Robert Wille, Nan-Yow Chen, An-Cheng Yang, Chun-Yu Lin

Abstract: This paper investigates the application of Quantum Support Vector Machines (QSVMs) with an emphasis on the computational advancements enabled by NVIDIA's cuQuantum SDK, especially leveraging the cuTensorNet library. We present a simulation workflow that substantially diminishes computational overhead, as evidenced by our experiments, from exponential to quadratic cost. While state vector simulatio… ▽ More This paper investigates the application of Quantum Support Vector Machines (QSVMs) with an emphasis on the computational advancements enabled by NVIDIA's cuQuantum SDK, especially leveraging the cuTensorNet library. We present a simulation workflow that substantially diminishes computational overhead, as evidenced by our experiments, from exponential to quadratic cost. While state vector simulations become infeasible for qubit counts over 50, our evaluation demonstrates that cuTensorNet speeds up simulations to be completed within seconds on the NVIDIA A100 GPU, even for qubit counts approaching 784. By employing multi-GPU processing with Message Passing Interface (MPI), we document a marked decrease in computation times, effectively demonstrating the strong linear speedup of our approach for increasing data sizes. This enables QSVMs to operate efficiently on High-Performance Computing (HPC) systems, thereby opening a new window for researchers to explore complex quantum algorithms that have not yet been investigated. In accuracy assessments, our QSVM achieves up to 95\% on challenging classifications within the MNIST dataset for training sets larger than 100 instances, surpassing the capabilities of classical SVMs. These advancements position cuTensorNet within the cuQuantum SDK as a pivotal tool for scaling quantum machine learning simulations and potentially signpost the seamless integration of such computational strategies as pivotal within the Quantum-HPC ecosystem. △ Less

Submitted 8 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

Comments: 10 pages, 14 figures

arXiv:2405.02307 [pdf]

Helium Detection in Technical Materials

Authors: Andrew K. Gillespie, Cuikun Lin, Django Jones, R. V. Duncan

Abstract: Materials used to study nuclear fusion can retain atmospheric helium unless pretreated before an experiment. Understanding helium outgassing is important for accurate diagnostics in experiments surrounding nuclear fusion. The presence of helium is often cited as the primary evidence that a nuclear reaction has occurred, so it is imperative that known sources of helium are mitigated prior to procee… ▽ More Materials used to study nuclear fusion can retain atmospheric helium unless pretreated before an experiment. Understanding helium outgassing is important for accurate diagnostics in experiments surrounding nuclear fusion. The presence of helium is often cited as the primary evidence that a nuclear reaction has occurred, so it is imperative that known sources of helium are mitigated prior to proceeding with novel nuclear experiments. It is also necessary to ensure hermiticity when transferring gas aliquots from an experiment to a mass spectrometer. In this article, we present studies of detecting helium leak rates in systems used in novel nuclear experiments. We also present studies of helium retention in materials subjected to various heating profiles and atmospheric concentrations. Without pretreatment, stainless-steel 316 retains between 15 $\unicode{x2013}$ 240 pmol of $^{ 4}$He or an areal outgassing amount of 0.07 $\unicode{x2013}$ 1.20 pmol/$cm^{ 2}$. It also may reabsorb $^{ 4}$He from the atmosphere in time. These studies also demonstrate that it is necessary to pretreat most materials prior to performing experiments where the presence of $^{ 4}$He is being used as an indicator for novel nuclear reactions. △ Less

Submitted 25 March, 2024; originally announced May 2024.

Comments: 10 pages, 4 figures, 5 tables

arXiv:2405.01388 [pdf, other]

Meson spectroscopy from spectral densities in lattice gauge theories

Authors: Ed Bennett, Luigi Del Debbio, Niccolò Forzano, Ryan C. Hill, Deog Ki Hong, Ho Hsiao, Jong-Wan Lee, C. -J. David Lin, Biagio Lucini, Alessandro Lupo, Maurizio Piai, Davide Vadacchino, Fabian Zierler

Abstract: Spectral densities encode non-perturbative information that enters the calculation of a plethora of physical observables in strongly coupled field theories. Phenomenological applications encompass aspects of standard-model hadronic physics, observable at current colliders, as well as correlation functions characterizing new physics proposals, testable in future experiments. By making use of numeri… ▽ More Spectral densities encode non-perturbative information that enters the calculation of a plethora of physical observables in strongly coupled field theories. Phenomenological applications encompass aspects of standard-model hadronic physics, observable at current colliders, as well as correlation functions characterizing new physics proposals, testable in future experiments. By making use of numerical data produced with lattice gauge theories, we perform a systematic study to demonstrate the effectiveness of recent technological progress in the reconstruction of spectral densities. To this purpose, we write and test new software packages that use energy-smeared spectral densities to analyze the mass spectrum of mesons. We assess the effectiveness of different smearing kernels and optimize the smearing parameters to the characteristics of available lattice ensembles. For concreteness, we analyze the Sp(4) lattice gauge theory with matter transforming in an admixture of fundamental and 2-index antisymmetric representations of the gauge group. We generate new ensembles for this theory, with lattices that have a longer extent in the time direction with respect to the spatial ones. We run our tests on these ensembles, obtaining new results about the spectrum of light mesons and their excitations. We make available our algorithm and software for the extraction of spectral densities, that can be applied to theories with other gauge groups, including the theory of strong interactions (QCD) governing hadronic physics in the standard model. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 44 pages, 20 figures

arXiv:2405.00888 [pdf, other]

DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling

Authors: Shikhar Tuli, Chi-Heng Lin, Yen-Chang Hsu, Niraj K. Jha, Yilin Shen, Hongxia Jin

Abstract: Traditional language models operate autoregressively, i.e., they predict one token at a time. Rapid explosion in model sizes has resulted in high inference times. In this work, we propose DynaMo, a suite of multi-token prediction language models that reduce net inference times. Our models $\textit{dynamically}$ predict multiple tokens based on their confidence in the predicted joint probability di… ▽ More Traditional language models operate autoregressively, i.e., they predict one token at a time. Rapid explosion in model sizes has resulted in high inference times. In this work, we propose DynaMo, a suite of multi-token prediction language models that reduce net inference times. Our models $\textit{dynamically}$ predict multiple tokens based on their confidence in the predicted joint probability distribution. We propose a lightweight technique to train these models, leveraging the weights of traditional autoregressive counterparts. Moreover, we propose novel ways to enhance the estimated joint probability to improve text generation quality, namely co-occurrence weighted masking and adaptive thresholding. We also propose systematic qualitative and quantitative methods to rigorously test the quality of generated text for non-autoregressive generation. One of the models in our suite, DynaMo-7.3B-T3, achieves same-quality generated text as the baseline (Pythia-6.9B) while achieving 2.57$\times$ speed-up with only 5.87% and 2.67% parameter and training time overheads, respectively. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: Accepted at NAACL 2024

arXiv:2405.00113 [pdf, other]

The Extremely Metal-Poor SN 2023ufx: A Local Analog to High-Redshift Type II Supernovae

Authors: Michael A. Tucker, Jason Hinkle, Charlotte R. Angus, Katie Auchettl, Willem B. Hoogendam, Benjamin Shappee, Christopher S. Kochanek, Chris Ashall, Thomas de Boer, Kenneth C. Chambers, Dhvanil D. Desai, Aaron Do, Michael D. Fulton, Hua Gao, Joanna Herman, Mark Huber, Chris Lidman, Chien-Cheng Lin, Thomas B. Lowe, Eugene A. Magnier, Bailey Martin, Paloma Minguez, Matt Nicholl, Miika Pursiainen, S. J. Smartt , et al. (4 additional authors not shown)

Abstract: We present extensive observations of the Type II supernova (SN II) 2023ufx which is likely the most metal-poor SN II observed to-date. It exploded in the outskirts of a low-metallicity ($Z_{\rm host} \sim 0.1~Z_\odot$) dwarf ($M_g = -13.23\pm0.15$~mag; $r_e\sim 1$~kpc) galaxy. The explosion is luminous, peaking at $M_g\approx -18.5~$mag, and shows rapid evolution. The $r$-band (pseudo-bolometric)… ▽ More We present extensive observations of the Type II supernova (SN II) 2023ufx which is likely the most metal-poor SN II observed to-date. It exploded in the outskirts of a low-metallicity ($Z_{\rm host} \sim 0.1~Z_\odot$) dwarf ($M_g = -13.23\pm0.15$~mag; $r_e\sim 1$~kpc) galaxy. The explosion is luminous, peaking at $M_g\approx -18.5~$mag, and shows rapid evolution. The $r$-band (pseudo-bolometric) light curve has a shock-cooling phase lasting 20 (17) days followed by a 19 (23)-day plateau. The entire optically-thick phase lasts only $\approx 55~$days following explosion, indicating that the red supergiant progenitor had a thinned H envelope prior to explosion. The early spectra obtained during the shock-cooling phase show no evidence for narrow emission features and limit the pre-explosion mass-loss rate to $\dot{M} \lesssim 10^{-3}~\rm M_\odot$/yr. The photospheric-phase spectra are devoid of prominent metal absorption features, indicating a progenitor metallicity of $\lesssim 0.1~Z_\odot$. The semi-nebular ($\sim 60-130~$d) spectra reveal weak Fe II, but other metal species typically observed at these phases (Ti II, Sc II, Ba II) are conspicuously absent. The late-phase optical and near-infrared spectra also reveal broad ($\approx 10^4~\rm{km}~\rm s^{-1}$) double-peaked H$α$, P$β$, and P$γ$ emission profiles suggestive of a fast outflow launched during the explosion. Outflows are typically attributed to rapidly-rotating progenitors which also prefer metal-poor environments. This is only the second SN II with $\lesssim 0.1~Z_\odot$ and both exhibit peculiar evolution, suggesting a sizable fraction of metal-poor SNe II have distinct properties compared to nearby metal-enriched SNe II. These observations lay the groundwork for modeling the metal-poor SNe II expected in the early Universe. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 17 pages, 15 figures and 3 tables in main text, an additional 5 pages, 4 figures, and 2 tables in the appendix. Submitted to ApJ, comments welcome. All data will be made publicly available upon publication

arXiv:2405.00098 [pdf, other]

Amplitude analysis and branching fraction measurement of $B^{+}\to D^{*-}D^{+}_{s}π^{+}$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1057 additional authors not shown)

Abstract: The decays of the $B^{+}$ meson to the final state $D^{*-}D^{+}_{s}π^{+}$ are studied in proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV, corresponding to a total integrated luminosity of 9 fb$^{-1}$. The ratio of branching fractions of the $B^{+}\to D^{*-}D^{+}_{s}π^{+}$ and $B^{0}\to D^{*-}D^{+}_{s}$ decays is measured to be… ▽ More The decays of the $B^{+}$ meson to the final state $D^{*-}D^{+}_{s}π^{+}$ are studied in proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV, corresponding to a total integrated luminosity of 9 fb$^{-1}$. The ratio of branching fractions of the $B^{+}\to D^{*-}D^{+}_{s}π^{+}$ and $B^{0}\to D^{*-}D^{+}_{s}$ decays is measured to be $0.173\pm 0.006\pm 0.010$, where the first uncertainty is statistical and the second is systematic. Using partially reconstructed $D^{*+}_{s}\to D^{+}_{s}γ$ and $D^{+}_{s}π^{0}$ decays, the ratio of branching fractions between the $B^{+}\to D^{*-}D^{*+}_{s}π^{+}$ and $B^{+}\to D^{*-}D^{+}_{s}π^{+}$ decays is determined as $1.31\pm 0.07\pm 0.14$. An amplitude analysis of the $B^{+}\to D^{*-}D^{+}_{s}π^{+}$ decay is performed for the first time, revealing dominant contributions from known excited charm resonances decaying to the $D^{*-}π^{+}$ final state. No significant evidence of exotic contributions in the $D^{+}_{s}π^{+}$ or $D^{*-}D^{+}_{s}$ channels is found. The fit fraction of the scalar state $T_{c\bar{s} 0}^{\ast}(2900)^{++}$ observed in the $B^{+}\to D^{-}D^{+}_{s}π^{+}$ decay is determined to be less than 2.3% at a 90% confidence level. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-001.html (LHCb public pages)

Report number: LHCb-PAPER-2024-001, CERN-EP-2024-110

arXiv:2404.19510 [pdf, other]

First observation of $Λ_{b}^{0} \rightarrow Σ_c^{(*)++} D^{(*)-} K^{-}$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1067 additional authors not shown)

Abstract: The four decays, $Λ_{b}^{0} \rightarrow Σ_c^{(*)++} D^{(*)-} K^{-}$, are observed for the first time using proton-proton collision data collected with the LHCb detector at a centre-of-mass energy of $13\,\rm{TeV}$, corresponding to an integrated luminosity of $6\,\rm{fb}^{-1}$. By considering the $Λ_b^0 \rightarrow Λ_c^{+} \overline{D}^0 K^{-}$ decay as reference channel, the following branching f… ▽ More The four decays, $Λ_{b}^{0} \rightarrow Σ_c^{(*)++} D^{(*)-} K^{-}$, are observed for the first time using proton-proton collision data collected with the LHCb detector at a centre-of-mass energy of $13\,\rm{TeV}$, corresponding to an integrated luminosity of $6\,\rm{fb}^{-1}$. By considering the $Λ_b^0 \rightarrow Λ_c^{+} \overline{D}^0 K^{-}$ decay as reference channel, the following branching fraction ratios are measured to be, $$\frac{\cal{B} (Λ_{b}^{0} \rightarrow Σ_{c}^{++} \rm{D}^{-} {K}^{-})}{\cal{B}(Λ_{b}^{0} \rightarrow Λ_c^{+} \rm \overline{D}^0 {K}^{-})} = {0.282}\pm{0.016}\pm{0.016}\pm{0.005}, \frac{\cal{B}(Λ_{b}^{0} \rightarrow Σ_{c}^{*++} \rm {D}^{-} {K}^{-})}{\cal{B}(Λ_{b}^{0} \rightarrow Σ_c^{++} \rm {D}^{-} {K}^{-})} = {0.460}\pm{0.052}\pm{0.028}, \frac{\cal{B}(Λ_{b}^{0} \rightarrow Σ_{c}^{++} \rm {D}^{*-} {K}^{-})}{\cal{B}(Λ_{b}^{0} \rightarrow Σ_c^{++} \rm {D}^{-} {K}^{-})} = {2.261}\pm{0.202}\pm{0.129}\pm{0.046}, \frac{\cal{B}(Λ_{b}^{0} \rightarrow Σ_{c}^{*++} \rm D^{*-} K^{-})}{\cal{B}(Λ_{b}^{0} \rightarrow Σ_c^{++} \rm D^{-} K^{-})} = {0.896}\pm{0.137}\pm{0.066}\pm{0.018},$$ where the first uncertainties are statistical, the second are systematic, and the third are due to uncertainties in the branching fractions of intermediate particle decays. These initial observations mark the beginning of pentaquark searches in these modes, with more data set to become available following the LHCb upgrade. △ Less

Submitted 11 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-044.html (LHCb public pages)

Report number: LHCb-PAPER-2023-044, CERN-EP-2024-098

arXiv:2404.18327 [pdf, other]

MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition

Authors: Peihao Xiang, Chaohao Lin, Kaida Wu, Ou Bai

Abstract: This paper presents a novel approach to processing multimodal data for dynamic emotion recognition, named as the Multimodal Masked Autoencoder for Dynamic Emotion Recognition (MultiMAE-DER). The MultiMAE-DER leverages the closely correlated representation information within spatiotemporal sequences across visual and audio modalities. By utilizing a pre-trained masked autoencoder model, the MultiMA… ▽ More This paper presents a novel approach to processing multimodal data for dynamic emotion recognition, named as the Multimodal Masked Autoencoder for Dynamic Emotion Recognition (MultiMAE-DER). The MultiMAE-DER leverages the closely correlated representation information within spatiotemporal sequences across visual and audio modalities. By utilizing a pre-trained masked autoencoder model, the MultiMAEDER is accomplished through simple, straightforward finetuning. The performance of the MultiMAE-DER is enhanced by optimizing six fusion strategies for multimodal input sequences. These strategies address dynamic feature correlations within cross-domain data across spatial, temporal, and spatiotemporal sequences. In comparison to state-of-the-art multimodal supervised learning models for dynamic emotion recognition, MultiMAE-DER enhances the weighted average recall (WAR) by 4.41% on the RAVDESS dataset and by 2.06% on the CREMAD. Furthermore, when compared with the state-of-the-art model of multimodal self-supervised learning, MultiMAE-DER achieves a 1.86% higher WAR on the IEMOCAP dataset. △ Less

Submitted 16 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

Comments: Camera-ready Version, Accepted by ICPRS 2024

arXiv:2404.18081 [pdf, other]

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts. To further explore and enhance LLMs' potential in music composition by leveraging their reasoning ability and the large knowledge base in music history and theory, we propose ComposerX, an agent-based symbolic music generation framework. We find that applying a multi-agent approach significantly improves the music composition quality of GPT-4. The results demonstrate that ComposerX is capable of producing coherent polyphonic music compositions with captivating melodies, while adhering to user instructions. △ Less

Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.17871 [pdf, other]

A Survey of Deep Learning Library Testing Methods

Authors: Xiaoyu Zhang, Weipeng Jiang, Chao Shen, Qi Li, Qian Wang, Chenhao Lin, Xiaohong Guan

Abstract: In recent years, software systems powered by deep learning (DL) techniques have significantly facilitated people's lives in many aspects. As the backbone of these DL systems, various DL libraries undertake the underlying optimization and computation. However, like traditional software, DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety. Study… ▽ More In recent years, software systems powered by deep learning (DL) techniques have significantly facilitated people's lives in many aspects. As the backbone of these DL systems, various DL libraries undertake the underlying optimization and computation. However, like traditional software, DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety. Studying the characteristics of DL libraries, their associated bugs, and the corresponding testing methods is crucial for enhancing the security of DL systems and advancing the widespread application of DL technology. This paper provides an overview of the testing research related to various DL libraries, discusses the strengths and weaknesses of existing methods, and provides guidance and reference for the application of the DL library. This paper first introduces the workflow of DL underlying libraries and the characteristics of three kinds of DL libraries involved, namely DL framework, DL compiler, and DL hardware library. It then provides definitions for DL underlying library bugs and testing. Additionally, this paper summarizes the existing testing methods and tools tailored to these DL libraries separately and analyzes their effectiveness and limitations. It also discusses the existing challenges of DL library testing and outlines potential directions for future research. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: 34 pages, 8 figures, 4 tables

arXiv:2404.17481 [pdf, other]

ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking Explanations

Authors: Tyler Loakman, Chenghua Lin

Abstract: This paper presents a partial reproduction of Generating Fact Checking Explanations by Anatanasova et al (2020) as part of the ReproHum element of the ReproNLP shared task to reproduce the findings of NLP research regarding human evaluation. This shared task aims to investigate the extent to which NLP as a field is becoming more or less reproducible over time. Following the instructions provided b… ▽ More This paper presents a partial reproduction of Generating Fact Checking Explanations by Anatanasova et al (2020) as part of the ReproHum element of the ReproNLP shared task to reproduce the findings of NLP research regarding human evaluation. This shared task aims to investigate the extent to which NLP as a field is becoming more or less reproducible over time. Following the instructions provided by the task organisers and the original authors, we collect relative rankings of 3 fact-checking explanations (comprising a gold standard and the outputs of 2 models) for 40 inputs on the criteria of Coverage. The results of our reproduction and reanalysis of the original work's raw results lend support to the original findings, with similar patterns seen between the original work and our reproduction. Whilst we observe slight variation from the original results, our findings support the main conclusions drawn by the original authors pertaining to the efficacy of their proposed models. △ Less

Submitted 14 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: Accepted to HumEval at LREC-Coling 2024. Table 1 updated

arXiv:2404.14979 [pdf, other]

SGFormer: Spherical Geometry Transformer for 360 Depth Estimation

Authors: Junsong Zhang, Zisong Chen, Chunyu Lin, Lang Nie, Zhijie Shen, Junda Huang, Yao Zhao

Abstract: Panoramic distortion poses a significant challenge in 360 depth estimation, particularly pronounced at the north and south poles. Existing methods either adopt a bi-projection fusion strategy to remove distortions or model long-range dependencies to capture global structures, which can result in either unclear structure or insufficient local perception. In this paper, we propose a spherical geomet… ▽ More Panoramic distortion poses a significant challenge in 360 depth estimation, particularly pronounced at the north and south poles. Existing methods either adopt a bi-projection fusion strategy to remove distortions or model long-range dependencies to capture global structures, which can result in either unclear structure or insufficient local perception. In this paper, we propose a spherical geometry transformer, named SGFormer, to address the above issues, with an innovative step to integrate spherical geometric priors into vision transformers. To this end, we retarget the transformer decoder to a spherical prior decoder (termed SPDecoder), which endeavors to uphold the integrity of spherical structures during decoding. Concretely, we leverage bipolar re-projection, circular rotation, and curve local embedding to preserve the spherical characteristics of equidistortion, continuity, and surface distance, respectively. Furthermore, we present a query-based global conditional position embedding to compensate for spatial structure at varying resolutions. It not only boosts the global perception of spatial position but also sharpens the depth structure across different patches. Finally, we conduct extensive experiments on popular benchmarks, demonstrating our superiority over state-of-the-art solutions. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14801 [pdf, other]

DesignProbe: A Graphic Design Benchmark for Multimodal Large Language Models

Authors: Jieru Lin, Danqing Huang, Tiejun Zhao, Dechen Zhan, Chin-Yew Lin

Abstract: A well-executed graphic design typically achieves harmony in two levels, from the fine-grained design elements (color, font and layout) to the overall design. This complexity makes the comprehension of graphic design challenging, for it needs the capability to both recognize the design elements and understand the design. With the rapid development of Multimodal Large Language Models (MLLMs), we es… ▽ More A well-executed graphic design typically achieves harmony in two levels, from the fine-grained design elements (color, font and layout) to the overall design. This complexity makes the comprehension of graphic design challenging, for it needs the capability to both recognize the design elements and understand the design. With the rapid development of Multimodal Large Language Models (MLLMs), we establish the DesignProbe, a benchmark to investigate the capability of MLLMs in design. Our benchmark includes eight tasks in total, across both the fine-grained element level and the overall design level. At design element level, we consider both the attribute recognition and semantic understanding tasks. At overall design level, we include style and metaphor. 9 MLLMs are tested and we apply GPT-4 as evaluator. Besides, further experiments indicates that refining prompts can enhance the performance of MLLMs. We first rewrite the prompts by different LLMs and found increased performances appear in those who self-refined by their own LLMs. We then add extra task knowledge in two different ways (text descriptions and image examples), finding that adding images boost much more performance over texts. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: work in progress

arXiv:2404.14135 [pdf, other]

Text in the Dark: Extremely Low-Light Text Image Enhancement

Authors: Che-Tsung Lin, Chun Chet Ng, Zhi Qin Tan, Wan Jun Nah, Xinyu Wang, Jie Long Kew, Pohao Hsu, Shang Hong Lai, Chee Seng Chan, Christopher Zach

Abstract: Extremely low-light text images are common in natural scenes, making scene text detection and recognition challenging. One solution is to enhance these images using low-light image enhancement methods before text extraction. However, previous methods often do not try to particularly address the significance of low-level features, which are crucial for optimal performance on downstream scene text t… ▽ More Extremely low-light text images are common in natural scenes, making scene text detection and recognition challenging. One solution is to enhance these images using low-light image enhancement methods before text extraction. However, previous methods often do not try to particularly address the significance of low-level features, which are crucial for optimal performance on downstream scene text tasks. Further research is also hindered by the lack of extremely low-light text datasets. To address these limitations, we propose a novel encoder-decoder framework with an edge-aware attention module to focus on scene text regions during enhancement. Our proposed method uses novel text detection and edge reconstruction losses to emphasize low-level scene text features, leading to successful text extraction. Additionally, we present a Supervised Deep Curve Estimation (Supervised-DCE) model to synthesize extremely low-light images based on publicly available scene text datasets such as ICDAR15 (IC15). We also labeled texts in the extremely low-light See In the Dark (SID) and ordinary LOw-Light (LOL) datasets to allow for objective assessment of extremely low-light image enhancement through scene text tasks. Extensive experiments show that our model outperforms state-of-the-art methods in terms of both image quality and scene text metrics on the widely-used LOL, SID, and synthetic IC15 datasets. Code and dataset will be released publicly at https://github.com/chunchet-ng/Text-in-the-Dark. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: The first two authors contributed equally to this work

arXiv:2404.13840 [pdf, other]

doi 10.1103/PhysRevD.110.012006

Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be… ▽ More Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be $0.38\pm0.20_\text{stat.}\pm0.01_\text{syst.}$ ($R< 0.83$ at 90\% confidence level). In addition, we measure the ratio of the average cross section of $e^+e^-\toωX(3872)$ to $e^+e^-\toωχ_{c1}(ωχ_{c2})$ to be $σ_{ωX(3872)}/σ_{ωχ_{c1}}~(σ_{ωX(3872)}/σ_{ωχ_{c2}})=5.2\pm1.0_\text{stat.}\pm1.9_\text{syst.}~ (5.5\pm1.1_\text{stat.}\pm2.4_\text{syst.})$. Finally, we search for the process of $e^+e^-\toγX(3872)$, and no obvious signal is observed. The upper limit on the ratio of the average cross section of $e^+e^-\toγX(3872)$ to $e^+e^-\toωX(3872)$ is set as $σ_{γX(3872)}/σ_{ωX(3872)}<0.23$ at 90\% confidence level. △ Less

Submitted 13 July, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: 19 pages, 10 figures

Journal ref: Phys. Rev. D 110, 012006 (2024)

arXiv:2404.13654 [pdf, other]

Multi-AUV Cooperative Underwater Multi-Target Tracking Based on Dynamic-Switching-enabled Multi-Agent Reinforcement Learning

Authors: Shengbo Wang, Chuan Lin, Guangjie Han, Shengchao Zhu, Zhixian Li, Zhenyu Wang

Abstract: With the rapid development of underwater communication, sensing, automation, robot technologies, autonomous underwater vehicle (AUV) swarms are gradually becoming popular and have been widely promoted in ocean exploration and underwater tracking or surveillance, etc. However, the complex underwater environment poses significant challenges for AUV swarm-based accurate tracking for the underwater mo… ▽ More With the rapid development of underwater communication, sensing, automation, robot technologies, autonomous underwater vehicle (AUV) swarms are gradually becoming popular and have been widely promoted in ocean exploration and underwater tracking or surveillance, etc. However, the complex underwater environment poses significant challenges for AUV swarm-based accurate tracking for the underwater moving targets. In this paper, we aim at proposing a multi-AUV cooperative underwater multi-target tracking algorithm especially when the real underwater factors are taken into account.We first give normally modelling approach for the underwater sonar-based detection and the ocean current interference on the target tracking process.Then, we regard the AUV swarm as a underwater ad-hoc network and propose a novel Multi-Agent Reinforcement Learning (MARL) architecture towards the AUV swarm based on Software-Defined Networking (SDN).It enhances the flexibility and scalability of the AUV swarm through centralized management and distributed operations.Based on the proposed MARL architecture, we propose the "dynamic-attention switching" and "dynamic-resampling switching" mechanisms, to enhance the efficiency and accuracy of AUV swarm cooperation during task execution.Finally, based on a proposed AUV classification method, we propose an efficient cooperative tracking algorithm called ASMA.Evaluation results demonstrate that our proposed tracking algorithm can perform precise underwater multi-target tracking, comparing with many of recent research products in terms of convergence speed and tracking accuracy. △ Less

Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13445 [pdf, other]

DMesh: A Differentiable Mesh Representation

Authors: Sanghyun Son, Matheus Gadelha, Yang Zhou, Zexiang Xu, Ming C. Lin, Yi Zhou

Abstract: We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and select triangular faces on the tetrahedra to define the final mesh. We formulate probability of… ▽ More We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and select triangular faces on the tetrahedra to define the final mesh. We formulate probability of faces to exist on the actual surface in a differentiable manner based on the WDT. This enables DMesh to represent meshes of various topology in a differentiable way, and allows us to reconstruct the mesh under various observations, such as point cloud and multi-view images using gradient-based optimization. The source code and full paper is available at: https://sonsang.github.io/dmesh-project. △ Less

Submitted 1 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

Comments: 35 pages, 22 figures. Updated with more analysis and experimental results

arXiv:2404.12803 [pdf, other]

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Authors: Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao Liu, Yuan Xie, Xiang Bai, Can Huang

Abstract: Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data. To this end, we introduce a new approach for creating a massive, high-quality instruction-tuning dataset, Square… ▽ More Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data. To this end, we introduce a new approach for creating a massive, high-quality instruction-tuning dataset, Square-10M, which is generated using closed-source MLLMs. The data construction process, termed Square, consists of four steps: Self-Questioning, Answering, Reasoning, and Evaluation. Our experiments with Square-10M led to three key findings: 1) Our model, TextSquare, considerably surpasses open-source previous state-of-the-art Text-centric MLLMs and sets a new standard on OCRBench(62.2%). It even outperforms top-tier models like GPT4V and Gemini in 6 of 10 text-centric benchmarks. 2) Additionally, we demonstrate the critical role of VQA reasoning data in offering comprehensive contextual insights for specific questions. This not only improves accuracy but also significantly mitigates hallucinations. Specifically, TextSquare scores an average of 75.1% across four general VQA and hallucination evaluation datasets, outperforming previous state-of-the-art models. 3) Notably, the phenomenon observed in scaling text-centric VQA datasets reveals a vivid pattern: the exponential increase of instruction tuning data volume is directly proportional to the improvement in model performance, thereby validating the necessity of the dataset scale and the high quality of Square-10M. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.11305 [pdf, other]

AR for Sexual Violence: Maintaining Ethical Balance While Enhancing Empathy

Authors: Chunwei Lin

Abstract: This study showcases an augmented reality (AR) experience designed to promote gender justice and increase awareness of sexual violence in Taiwan. By leveraging AR, this project overcomes the limitations of offline exhibitions on social issues by motivating the public to participate and enhancing their willingness to delve into the topic. The discussion explores how direct exposure to sexual violen… ▽ More This study showcases an augmented reality (AR) experience designed to promote gender justice and increase awareness of sexual violence in Taiwan. By leveraging AR, this project overcomes the limitations of offline exhibitions on social issues by motivating the public to participate and enhancing their willingness to delve into the topic. The discussion explores how direct exposure to sexual violence can induce negative emotions and secondary trauma among users. It also suggests strategies for using AR to alleviate such issues, particularly by avoiding simulations of actual incidents. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 5 pages, 3 figures, Presented at CHI 2024 (arXiv:2404.05889)

Report number: ARSJ/2024/07

arXiv:2404.11245 [pdf, other]

Constraining the Modified Friction in Gravitational Wave Propagation with Precessing Black Hole Binaries

Authors: Chunbo Lin, Tao Zhu, Rui Niu, Wen Zhao

Abstract: A broad class of modified gravities can result in a modified friction effect in the propagation of gravitational waves (GWs). This effect changes the amplitude-damping rate of GWs during their propagation in the cosmological distance and thus modifies the standard luminosity distance of GWs in general relativity. Therefore, one can constrain this modified friction by measuring both the luminosity… ▽ More A broad class of modified gravities can result in a modified friction effect in the propagation of gravitational waves (GWs). This effect changes the amplitude-damping rate of GWs during their propagation in the cosmological distance and thus modifies the standard luminosity distance of GWs in general relativity. Therefore, one can constrain this modified friction by measuring both the luminosity distance and redshift of the GW sources. In this paper, we investigate the prospects of constraining such modified friction effect by using the precessing binary black holes with ground-based GW detectors. For this purpose, we consider 20 precessing events detected by the GW detector network consisting of two LIGO detectors and two third-generation GW detectors (the Einstein Telescope and the Cosmic Explorer). The redshift information of these events is obtained by identifying their possible host galaxies in the GLADE+ galaxy catalog. We show that the precession in the binary system can improve significantly the precision of luminosity distance and thus lead to a tighter constraint on the modified friction. By assuming narrow priors on cosmological parameters that are consistent with the uncertainties of Planck 2018 results, our analysis shows that the modified friction effect, characterized by two parameters $(Ξ_0, n)$, can be constrained to be $Ξ_0 = 1.002^{+0.004}_{-0.004}$ and $n=3.257^{+2.595}_{-2.192}$, in which the result of $Ξ_0$ is about two orders of magnitude better than current result from an analysis with GWTC-3. Our result sets the stage for future research with third-generation GW detectors, offering new insights into gravitational parameter modifications. It also contributes to the understanding of the properties and applications of binary black hole systems with precession. △ Less

Submitted 17 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11118 [pdf, other]

MHLR: Moving Haar Learning Rate Scheduler for Large-scale Face Recognition Training with One GPU

Authors: Xueyuan Gong, Yain-whar Si, Zheng Zhang, Xiaochen Yuan, Ke Wang, Xinyuan Zhang, Cong Lin, Xiaoxiang Liu

Abstract: Face recognition (FR) has seen significant advancements due to the utilization of large-scale datasets. Training deep FR models on large-scale datasets with multiple GPUs is now a common practice. In fact, computing power has evolved into a foundational and indispensable resource in the area of deep learning. It is nearly impossible to train a deep FR model without holding adequate hardware resour… ▽ More Face recognition (FR) has seen significant advancements due to the utilization of large-scale datasets. Training deep FR models on large-scale datasets with multiple GPUs is now a common practice. In fact, computing power has evolved into a foundational and indispensable resource in the area of deep learning. It is nearly impossible to train a deep FR model without holding adequate hardware resources. Recognizing this challenge, some FR approaches have started exploring ways to reduce the time complexity of the fully-connected layer in FR models. Unlike other approaches, this paper introduces a simple yet highly effective approach, Moving Haar Learning Rate (MHLR) scheduler, for scheduling the learning rate promptly and accurately in the training process. MHLR supports large-scale FR training with only one GPU, which is able to accelerate the model to 1/4 of its original training time without sacrificing more than 1% accuracy. More specifically, MHLR only needs $30$ hours to train the model ResNet100 on the dataset WebFace12M containing more than 12M face images with 0.6M identities. Extensive experiments validate the efficiency and effectiveness of MHLR. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10660 [pdf, other]

Discovery of the optical and radio counterpart to the fast X-ray transient EP240315a

Authors: J. H. Gillanders, L. Rhodes, S. Srivastav, F. Carotenuto, J. Bright, M. E. Huber, H. F. Stevance, S. J. Smartt, K. C. Chambers, T. -W. Chen, R. Fender, A. Andersson, A. J. Cooper, P. G. Jonker, F. J. Cowie, T. deBoer, N. Erasmus, M. D. Fulton, H. Gao, J. Herman, C. -C. Lin, T. Lowe, E. A. Magnier, H. -Y. Miao, P. Minguez , et al. (14 additional authors not shown)

Abstract: Fast X-ray Transients (FXTs) are extragalactic bursts of soft X-rays first identified >10 years ago. Since then, nearly 40 events have been discovered, although almost all of these have been recovered from archival Chandra and XMM-Newton data. To date, optical sky surveys and follow-up searches have not revealed any multi-wavelength counterparts. The Einstein Probe, launched in January 2024, has s… ▽ More Fast X-ray Transients (FXTs) are extragalactic bursts of soft X-rays first identified >10 years ago. Since then, nearly 40 events have been discovered, although almost all of these have been recovered from archival Chandra and XMM-Newton data. To date, optical sky surveys and follow-up searches have not revealed any multi-wavelength counterparts. The Einstein Probe, launched in January 2024, has started surveying the sky in the soft X-ray regime (0.5-4 keV) and will rapidly increase the sample of FXTs discovered in real time. Here, we report the first discovery of both an optical and radio counterpart to a distant FXT, the fourth source publicly released by the Einstein Probe. We discovered a fast-fading optical transient within the 3 arcmin localisation radius of EP240315a with the all-sky optical survey ATLAS, and our follow-up Gemini spectrum provides a redshift, z=4.859+/-0.002. Furthermore, we uncovered a radio counterpart in the S-band (3.0 GHz) with the MeerKAT radio interferometer. The optical (rest-frame UV) and radio luminosities indicate the FXT most likely originates from either a long gamma-ray burst or a relativistic tidal disruption event. This may be a fortuitous early mission detection by the Einstein Probe or may signpost a mode of discovery for high-redshift, high-energy transients through soft X-ray surveys, combined with locating multi-wavelength counterparts. △ Less

Submitted 19 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: Updated to match version accepted for publication in ApJL (17 pages, 4 figures, 2 tables)

arXiv:2404.10279 [pdf, other]

EucliDreamer: Fast and High-Quality Texturing for 3D Models with Depth-Conditioned Stable Diffusion

Authors: Cindy Le, Congrui Hetang, Chendi Lin, Ang Cao, Yihui He

Abstract: We present EucliDreamer, a simple and effective method to generate textures for 3D models given text prompts and meshes. The texture is parametrized as an implicit function on the 3D surface, which is optimized with the Score Distillation Sampling (SDS) process and differentiable rendering. To generate high-quality textures, we leverage a depth-conditioned Stable Diffusion model guided by the dept… ▽ More We present EucliDreamer, a simple and effective method to generate textures for 3D models given text prompts and meshes. The texture is parametrized as an implicit function on the 3D surface, which is optimized with the Score Distillation Sampling (SDS) process and differentiable rendering. To generate high-quality textures, we leverage a depth-conditioned Stable Diffusion model guided by the depth image rendered from the mesh. We test our approach on 3D models in Objaverse and conducted a user study, which shows its superior quality compared to existing texturing methods like Text2Tex. In addition, our method converges 2 times faster than DreamFusion. Through text prompting, textures of diverse art styles can be produced. We hope Euclidreamer proides a viable solution to automate a labor-intensive stage in 3D content creation. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Short version of arXiv:2311.15573

arXiv:2404.09995 [pdf, other]

Taming Latent Diffusion Model for Neural Radiance Field Inpainting

Authors: Chieh Hubert Lin, Changil Kim, Jia-Bin Huang, Qinbo Li, Chih-Yao Ma, Johannes Kopf, Ming-Hsuan Yang, Hung-Yu Tseng

Abstract: Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the rad… ▽ More Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Project page: https://hubert0527.github.io/MALD-NeRF

arXiv:2404.09831 [pdf, other]

Digging into contrastive learning for robust depth estimation with diffusion models

Authors: Jiyuan Wang, Chunyu Lin, Lang Nie, Kang Liao, Shuwei Shao, Yao Zhao

Abstract: Recently, diffusion-based depth estimation methods have drawn widespread attention due to their elegant denoising patterns and promising performance. However, they are typically unreliable under adverse conditions prevalent in real-world scenarios, such as rainy, snowy, etc. In this paper, we propose a novel robust depth estimation method called D4RD, featuring a custom contrastive learning mode t… ▽ More Recently, diffusion-based depth estimation methods have drawn widespread attention due to their elegant denoising patterns and promising performance. However, they are typically unreliable under adverse conditions prevalent in real-world scenarios, such as rainy, snowy, etc. In this paper, we propose a novel robust depth estimation method called D4RD, featuring a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments. Concretely, we integrate the strength of knowledge distillation into contrastive learning, building the `trinity' contrastive scheme. This scheme utilizes the sampled noise of the forward diffusion process as a natural reference, guiding the predicted noise in diverse scenes toward a more stable and precise optimum. Moreover, we extend noise-level trinity to encompass more generic feature and image levels, establishing a multi-level contrast to distribute the burden of robust perception across the overall network. Before addressing complex scenarios, we enhance the stability of the baseline diffusion model with three straightforward yet effective improvements, which facilitate convergence and remove depth outliers. Extensive experiments demonstrate that D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions. The code for D4RD will be made available for further exploration and adoption. △ Less

Submitted 19 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: 8 pages,6 figures

arXiv:2404.09690 [pdf, other]

Harnessing GPT-4V(ision) for Insurance: A Preliminary Exploration

Authors: Chenwei Lin, Hanjia Lyu, Jiebo Luo, Xian Xu

Abstract: The emergence of Large Multimodal Models (LMMs) marks a significant milestone in the development of artificial intelligence. Insurance, as a vast and complex discipline, involves a wide variety of data forms in its operational processes, including text, images, and videos, thereby giving rise to diverse multimodal tasks. Despite this, there has been limited systematic exploration of multimodal tas… ▽ More The emergence of Large Multimodal Models (LMMs) marks a significant milestone in the development of artificial intelligence. Insurance, as a vast and complex discipline, involves a wide variety of data forms in its operational processes, including text, images, and videos, thereby giving rise to diverse multimodal tasks. Despite this, there has been limited systematic exploration of multimodal tasks specific to insurance, nor a thorough investigation into how LMMs can address these challenges. In this paper, we explore GPT-4V's capabilities in the insurance domain. We categorize multimodal tasks by focusing primarily on visual aspects based on types of insurance (e.g., auto, household/commercial property, health, and agricultural insurance) and insurance stages (e.g., risk assessment, risk monitoring, and claims processing). Our experiment reveals that GPT-4V exhibits remarkable abilities in insurance-related tasks, demonstrating not only a robust understanding of multimodal content in the insurance domain but also a comprehensive knowledge of insurance scenarios. However, there are notable shortcomings: GPT-4V struggles with detailed risk rating and loss assessment, suffers from hallucination in image understanding, and shows variable support for different languages. Through this work, we aim to bridge the insurance domain with cutting-edge LMM technology, facilitate interdisciplinary exchange and development, and provide a foundation for the continued advancement and evolution of future research endeavors. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09269 [pdf, other]

PANet: A Physics-guided Parametric Augmentation Net for Image Dehazing by Hazing

Authors: Chih-Ling Chang, Fu-Jen Tsai, Zi-Ling Huang, Lin Gu, Chia-Wen Lin

Abstract: Image dehazing faces challenges when dealing with hazy images in real-world scenarios. A huge domain gap between synthetic and real-world haze images degrades dehazing performance in practical settings. However, collecting real-world image datasets for training dehazing models is challenging since both hazy and clean pairs must be captured under the same conditions. In this paper, we propose a Phy… ▽ More Image dehazing faces challenges when dealing with hazy images in real-world scenarios. A huge domain gap between synthetic and real-world haze images degrades dehazing performance in practical settings. However, collecting real-world image datasets for training dehazing models is challenging since both hazy and clean pairs must be captured under the same conditions. In this paper, we propose a Physics-guided Parametric Augmentation Network (PANet) that generates photo-realistic hazy and clean training pairs to effectively enhance real-world dehazing performance. PANet comprises a Haze-to-Parameter Mapper (HPM) to project hazy images into a parameter space and a Parameter-to-Haze Mapper (PHM) to map the resampled haze parameters back to hazy images. In the parameter space, we can pixel-wisely resample individual haze parameter maps to generate diverse hazy images with physically-explainable haze conditions unseen in the training set. Our experimental results demonstrate that PANet can augment diverse realistic hazy images to enrich existing hazy image benchmarks so as to effectively boost the performances of state-of-the-art image dehazing models. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.09219 [pdf, ps, other]

Observation of $D \to a_{0}(980)π$ in the decays $D^{0} \rightarrow π^{+}π^{-}η$ and $D^{+} \rightarrow π^{+}π^{0}η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We report the first amplitude analysis of the decays $D^{0} \to π^{+} π^{-} η$ and $D^{+} \rightarrow π^{+}π^{0}η$ using a data sample taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, corresponding to an integrated luminosity of 7.9 ${\rm fb}^{-1}$. The contribution from the process $D^{0(+)} \to a_{0}(980)^{+} π^{-(0)}$ is significantly larger than the… ▽ More We report the first amplitude analysis of the decays $D^{0} \to π^{+} π^{-} η$ and $D^{+} \rightarrow π^{+}π^{0}η$ using a data sample taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, corresponding to an integrated luminosity of 7.9 ${\rm fb}^{-1}$. The contribution from the process $D^{0(+)} \to a_{0}(980)^{+} π^{-(0)}$ is significantly larger than the $D^{0(+)} \to a_{0}(980)^{-(0)} π^{+}$ contribution. The ratios $\mathcal{B}(D^{0} \rightarrow a_{0}(980)^{+}π^{-})/\mathcal{B}(D^{0} \rightarrow a_{0}(980)^{-}π^{+})$ and $\mathcal{B}(D^{+} \rightarrow a_{0}(980)^{+}π^{0})/\mathcal{B}(D^{+} \rightarrow a_{0}(980)^{0}π^{+})$ are measured to be $7.5^{+2.5}_{-0.8\,\mathrm{stat.}}\pm1.7_{\mathrm{syst.}}$ and $2.6\pm0.6_{\mathrm{stat.}}\pm0.3_{\mathrm{syst.}}$, respectively. The measured $D^{0}$ ratio disagrees with the theoretical predictions by orders of magnitudes, thus implying a substantial contribution from final-state interactions. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.08780 [pdf, other]

The Population of Massive Stars in AGN Disks

Authors: Yi-Xian Chen, Douglas N. C. Lin

Abstract: Gravitational instability in the outskirts of Active Galactic Nuclei (AGN) disks lead to disk fragmentation and formation of super-massive (several 10^2Msun) stars with potentially long lifetimes. Alternatively, stars can be captured ex-situ and grow from gas accretion in the AGN disk. However, the number density distribution throughout the disk is limited by thermal feedback as their luminosities… ▽ More Gravitational instability in the outskirts of Active Galactic Nuclei (AGN) disks lead to disk fragmentation and formation of super-massive (several 10^2Msun) stars with potentially long lifetimes. Alternatively, stars can be captured ex-situ and grow from gas accretion in the AGN disk. However, the number density distribution throughout the disk is limited by thermal feedback as their luminosities provide the dominant heating source. We derive equilibrium stellar surface density profiles under two limiting contexts: in the case where the stellar lifetimes are prolonged due to recycling of hydrogen rich disk gas, only the fraction of gas converted into heat is removed from the disk accretion flow. Alternatively, if stellar composition recycling is inefficient and stars can evolve off the main sequence, the disk accretion rate is quenched towards smaller radii resembling a classical star-burst disk, albeit the effective removal rate depends not only on the stellar lifetime, but also the mass of stellar remnants. For AGNs with central Supermassive Black Hole (SMBH) masses of \sim 10^6 to 10^8Msun accreting at \sim 0.1 Eddington efficiency, we estimate a total number of 10^3 to 10^5 coexisting massive stars and the rate of stellar mergers to be 10^-3 to 1 per year. We motivate the detailed study of interaction between a swarm of massive stars through hydro and N body simulations to provide better prescriptions of dynamical processes in AGN disks, and to constrain more accurate estimates of the stellar population. △ Less

Submitted 30 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: Published in ApJ

arXiv:2404.08409 [pdf, other]

doi 10.1093/mnrasl/slae032

Evidence of the gamma-ray counterpart from nova FM Cir with Fermi-LAT

Authors: H. H. Wang, H. D. Yan, L. C. -C. Lin, J. Takata, P. -H. T. Tam

Abstract: We report the analysis results of X-ray and gamma-ray data of the nova FM Cir taken by Swift and Fermi-LAT. The gamma-ray emission from FM Cir can be identified with a significance level of 3sigma within 40 days after the nova eruption (2018 January 19) while we bin the light curve per day. The significance can further exceed 4 sigma confidence level if we accumulate longer time (i.e., 20 days) to… ▽ More We report the analysis results of X-ray and gamma-ray data of the nova FM Cir taken by Swift and Fermi-LAT. The gamma-ray emission from FM Cir can be identified with a significance level of 3sigma within 40 days after the nova eruption (2018 January 19) while we bin the light curve per day. The significance can further exceed 4 sigma confidence level if we accumulate longer time (i.e., 20 days) to bin the light curve. The gamma-ray counterpart could be identified with a Test Statistic (TS) above 4 until 180 days after the eruption. The duration of the gamma-ray detection was longer than those reported in the previous studies of the other novae detected in the GeV range. The significant X-ray emission was observed after the gamma-ray flux level fell below the sensitivity of Fermi-LAT. The hardness ratio of the X-ray emission decreased rapidly with time, and the spectra were dominated by blackbody radiation from the hot white dwarf. Except for the longer duration of the gamma-ray emission, the multi-wavelength properties of FM Cir closely resemble those of other novae detected in the GeV range. △ Less

Submitted 14 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: 6 pages,7 figures,Accepted for publication in Monthly Notices of the Royal Astronomical Society Letters

Journal ref: 2024 April 12

arXiv:2404.07965 [pdf, other]

Rho-1: Not All Tokens Are What You Need

Authors: Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen

Abstract: Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights,… ▽ More Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. Unlike traditional LMs that learn to predict every next token in a corpus, Rho-1 employs Selective Language Modeling (SLM), which selectively trains on useful tokens that aligned with the desired distribution. This approach involves scoring pretraining tokens using a reference model, and then training the language model with a focused loss on tokens with higher scores. When continual pretraining on 15B OpenWebMath corpus, Rho-1 yields an absolute improvement in few-shot accuracy of up to 30% in 9 math tasks. After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40.6% and 51.8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens. Furthermore, when pretraining on 80B general tokens, Rho-1 achieves 6.8% average enhancement across 15 diverse tasks, increasing both efficiency and performance of the language model pre-training. △ Less

Submitted 23 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: First two authors equal contribution

arXiv:2404.07847 [pdf, other]

The Effectiveness of a Simplified Model Structure for Crowd Counting

Authors: Lei Chen, Xinghang Gao, Fei Chao, Xiang Chang, Chih Min Lin, Xingen Gao, Shaopeng Lin, Hongyi Zhang, Juqiang Lin

Abstract: In the field of crowd counting research, many recent deep learning based methods have demonstrated robust capabilities for accurately estimating crowd sizes. However, the enhancement in their performance often arises from an increase in the complexity of the model structure. This paper discusses how to construct high-performance crowd counting models using only simple structures. We proposes the F… ▽ More In the field of crowd counting research, many recent deep learning based methods have demonstrated robust capabilities for accurately estimating crowd sizes. However, the enhancement in their performance often arises from an increase in the complexity of the model structure. This paper discusses how to construct high-performance crowd counting models using only simple structures. We proposes the Fuss-Free Network (FFNet) that is characterized by its simple and efficieny structure, consisting of only a backbone network and a multi-scale feature fusion structure. The multi-scale feature fusion structure is a simple structure consisting of three branches, each only equipped with a focus transition module, and combines the features from these branches through the concatenation operation. Our proposed crowd counting model is trained and evaluated on four widely used public datasets, and it achieves accuracy that is comparable to that of existing complex models. Furthermore, we conduct a comprehensive evaluation by replacing the existing backbones of various models such as FFNet and CCTrans with different networks, including MobileNet-v3, ConvNeXt-Tiny, and Swin-Transformer-Small. The experimental results further indicate that excellent crowd counting performance can be achieved with the simplied structure proposed by us. △ Less

Submitted 18 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07517 [pdf, other]

Electromyography Based Cross-Subject Limb Angle Estimation via Hierarchical Spiking Attentional Feature Decomposition Network

Authors: Xin Zhou, Chuang Lin, Can Wang, Xiaojiang Peng

Abstract: As human-machine interaction systems are developing towards lightweight and pervasive direction, the role of simultaneous and proportional control (SPC) in human-machine interaction becomes increasingly prominent. However, existing continuous joint angle prediction algorithms based on surface electromyography (sEMG) typically incur high inference costs or are only applicable to specific subjects r… ▽ More As human-machine interaction systems are developing towards lightweight and pervasive direction, the role of simultaneous and proportional control (SPC) in human-machine interaction becomes increasingly prominent. However, existing continuous joint angle prediction algorithms based on surface electromyography (sEMG) typically incur high inference costs or are only applicable to specific subjects rather than cross-subject scenarios. Therefore, we proposed a hierarchical Spiking Attentional FEature decomposition Network (SAFE-Net) in order to reduce inference costs and improve recognition accuracy in cross-subject scenarios. This network first encodes the sEMG signals into neural spiking forms through a Spiking Sparse Attention Encoder (SSAE). The compressed features are then decomposed into kinematic features and biological features by a Spiking Attentional Feature Decomposition (SAFD) module. Finally, the kinematic features and biological features are decoded into joint angle values and subject identity, respectively. We validated the effectiveness of SAFE-Net on two datasets (SIAT-DB1 and SIAT-DB2) and compared it with two state-of-the-art methods, Informer and Spikformer. Experimental results demonstrate that, on the one hand, SSAE saves 39.1% and 37.5% power consumption respectively over them in terms of inference costs. On the other hand, SAFE-Net outperforms Informer and Spikformer in recognition accuracy on both datasets. This study showcased that the proposed SAFE-Net can provide accurate predictions in cross-subject scenarios, offering a promising vision for precise continuous control of lower limb rehabilitation exoskeleton robots. △ Less

Submitted 5 July, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07131 [pdf, other]

Search for prompt production of pentaquarks in charm hadron final states

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, B. Adeva, M. Adinolfi, P. Adlarson, H. Afsharnia, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, A. Alfonso Albero, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey , et al. (1090 additional authors not shown)

Abstract: A search for hidden-charm pentaquark states decaying to a range of $Σ_{c}\bar{D}$ and $Λ_{c}\bar{D}$ final states, as well as doubly-charmed pentaquark states to $Σ_{c}D$ and $Λ_{c}^{+}D$, is made using samples of proton-proton collision data corresponding to an integrated luminosity of $5.7fb^{-1}$ recorded by the LHCb detector at $\sqrt{s} = 13Te\kern -0.1em V$. Since no significant signals are… ▽ More A search for hidden-charm pentaquark states decaying to a range of $Σ_{c}\bar{D}$ and $Λ_{c}\bar{D}$ final states, as well as doubly-charmed pentaquark states to $Σ_{c}D$ and $Λ_{c}^{+}D$, is made using samples of proton-proton collision data corresponding to an integrated luminosity of $5.7fb^{-1}$ recorded by the LHCb detector at $\sqrt{s} = 13Te\kern -0.1em V$. Since no significant signals are found, upper limits are set on the pentaquark yields relative to that of the $Λ_{c}^{+}$ baryon in the $Λ_{c}^{+}\to pK^{-}π^{+}$ decay mode. The known pentaquark states are also investigated, and their signal yields are found to be consistent with zero in all cases. △ Less

Submitted 2 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-018.html (LHCb public pages)

Report number: LHCb-PAPER-2023-018, CERN-EP-2024-071

arXiv:2404.06718 [pdf, other]

Measurement of the Born cross section for $e^{+}e^{-}\to ηh_c $ at center-of-mass energies between 4.1 and 4.6\,GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σ$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth,… ▽ More We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σ$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth, where the first uncertainties are statistical and the second systematic. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06393 [pdf, other]

MuPT: A Generative Symbolic Music Pretrained Transformer

Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions. △ Less

Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06080 [pdf]

Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures

Authors: Ching-Kai Lin, Di-Chun Wei, Yun-Chien Cheng

Abstract: This study aims to establish a computer-aided diagnosis system for endobronchial ultrasound (EBUS) surgery to assist physicians in the preliminary diagnosis of metastatic cancer. This involves arranging immediate examinations for other sites of metastatic cancer after EBUS surgery, eliminating the need to wait for reports, thereby shortening the waiting time by more than half and enabling patients… ▽ More This study aims to establish a computer-aided diagnosis system for endobronchial ultrasound (EBUS) surgery to assist physicians in the preliminary diagnosis of metastatic cancer. This involves arranging immediate examinations for other sites of metastatic cancer after EBUS surgery, eliminating the need to wait for reports, thereby shortening the waiting time by more than half and enabling patients to detect other cancers earlier, allowing for early planning and implementation of treatment plans. Unlike previous studies on cell image classification, which have abundant datasets for training, this study must also be able to make effective classifications despite the limited amount of case data for lung metastatic cancer. In the realm of small data set classification methods, Few-shot learning (FSL) has become mainstream in recent years. Through its ability to train on small datasets and its strong generalization capabilities, FSL shows potential in this task of lung metastatic cell image classification. This study will adopt the approach of Few-shot learning, referencing existing proposed models, and designing a model architecture for classifying lung metastases cell images. Batch Spectral Regularization (BSR) will be incorporated as a loss update parameter, and the Finetune method of PMF will be modified. In terms of test results, the addition of BSR and the modified Finetune method further increases the accuracy by 8.89% to 65.60%, outperforming other FSL methods. This study confirms that FSL is superior to supervised and transfer learning in classifying metastatic cancer and demonstrates that using BSR as a loss function and modifying Finetune can enhance the model's capabilities. △ Less

Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05364 [pdf, other]

Autoregressive Search of Gravitational Waves: Denoising

Authors: Sangin Kim, C. Y. Hui, Jianqi Yan, Alex P. Leung, Kwangmin Oh, A. K. H. Kong, L. C. -C. Lin, Kwan-Lok Li

Abstract: Because of the small strain amplitudes of gravitational-wave (GW) signals, unveiling them in the presence of detector/environmental noise is challenging. For visualizing the signals and extracting its waveform for a comparison with theoretical prediction, a frequency-domain whitening process is commonly adopted for filtering the data. In this work, we propose an alternative template-free framework… ▽ More Because of the small strain amplitudes of gravitational-wave (GW) signals, unveiling them in the presence of detector/environmental noise is challenging. For visualizing the signals and extracting its waveform for a comparison with theoretical prediction, a frequency-domain whitening process is commonly adopted for filtering the data. In this work, we propose an alternative template-free framework based on autoregressive modeling for denoising the GW data and extracting the waveform. We have tested our framework on extracting the injected signals from the simulated data as well as a series of known compact binary coalescence (CBC) events from the LIGO data. Comparing with the conventional whitening procedure, our methodology generally yields improved cross-correlation and reduced root mean square errors with respect to the signal model. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Phys. Rev. D in press, 16 pages, 11 figures, 1 table

Showing 151–200 of 3,097 results for author: Lin, C