Search | arXiv e-print repository

Physics case for quarkonium studies at the Electron Ion Collider

Authors: Daniël Boer, Chris A. Flett, Carlo Flore, Daniel Kikoła, Jean-Philippe Lansberg, Maxim Nefedov, Charlotte Van Hulse, Shohini Bhattacharya, Jelle Bor, Mathias Butenschoen, Federico Ceccopieri, Longjie Chen, Vincent Cheung, Umberto D'Alesio, Miguel Echevarria, Yoshitaka Hatta, Charles E. Hyde, Raj Kishore, Leszek Kosarzewski, Cédric Lorcé, Wenliang Li, Xuan Li, Luca Maxia, Andreas Metz, Asmita Mukherjee , et al. (19 additional authors not shown)

Abstract: The physics case for quarkonium-production studies accessible at the US Electron Ion Collider is described. The physics case for quarkonium-production studies accessible at the US Electron Ion Collider is described. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: Latex, 84 pages. Review prepared for Progress in Particle and Nuclear Physics

arXiv:2409.03403 [pdf, other]

RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning

Authors: Lawrence Yunliang Chen, Chenfeng Xu, Karthik Dharmarajan, Zubair Irshad, Richard Cheng, Kurt Keutzer, Masayoshi Tomizuka, Quan Vuong, Ken Goldberg

Abstract: Scaling up robot learning requires large and diverse datasets, and how to efficiently reuse collected data and transfer policies to new embodiments remains an open question. Emerging research such as the Open-X Embodiment (OXE) project has shown promise in leveraging skills by combining datasets including different robots. However, imbalances in the distribution of robot types and camera angles in… ▽ More Scaling up robot learning requires large and diverse datasets, and how to efficiently reuse collected data and transfer policies to new embodiments remains an open question. Emerging research such as the Open-X Embodiment (OXE) project has shown promise in leveraging skills by combining datasets including different robots. However, imbalances in the distribution of robot types and camera angles in many datasets make policies prone to overfit. To mitigate this issue, we propose RoVi-Aug, which leverages state-of-the-art image-to-image generative models to augment robot data by synthesizing demonstrations with different robots and camera views. Through extensive physical experiments, we show that, by training on robot- and viewpoint-augmented data, RoVi-Aug can zero-shot deploy on an unseen robot with significantly different camera angles. Compared to test-time adaptation algorithms such as Mirage, RoVi-Aug requires no extra processing at test time, does not assume known camera angles, and allows policy fine-tuning. Moreover, by co-training on both the original and augmented robot datasets, RoVi-Aug can learn multi-robot and multi-task policies, enabling more efficient transfer between robots and skills and improving success rates by up to 30%. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: CoRL 2024 (Oral)

arXiv:2409.03198 [pdf, other]

RoomDiffusion: A Specialized Diffusion Model in the Interior Design Industry

Authors: Zhaowei Wang, Ying Hao, Hao Wei, Qing Xiao, Lulu Chen, Yulong Li, Yue Yang, Tianyi Li

Abstract: Recent advancements in text-to-image diffusion models have significantly transformed visual content generation, yet their application in specialized fields such as interior design remains underexplored. In this paper, we present RoomDiffusion, a pioneering diffusion model meticulously tailored for the interior design industry. To begin with, we build from scratch a whole data pipeline to update an… ▽ More Recent advancements in text-to-image diffusion models have significantly transformed visual content generation, yet their application in specialized fields such as interior design remains underexplored. In this paper, we present RoomDiffusion, a pioneering diffusion model meticulously tailored for the interior design industry. To begin with, we build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. Subsequently, techniques such as multiaspect training, multi-stage fine-tune and model fusion are applied to enhance both the visual appeal and precision of the generated results. Lastly, leveraging the latent consistency Distillation method, we distill and expedite the model for optimal efficiency. Unlike existing models optimized for general scenarios, RoomDiffusion addresses specific challenges in interior design, such as lack of fashion, high furniture duplication rate, and inaccurate style. Through our holistic human evaluation protocol with more than 20 professional human evaluators, RoomDiffusion demonstrates industry-leading performance in terms of aesthetics, accuracy, and efficiency, surpassing all existing open source models such as stable diffusion and SDXL. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.03179 [pdf, other]

Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem

Authors: Qiwen Zhu, Yanjie Wang, Shilv Cai, Liqun Chen, Jiahuan Zhou, Luxin Yan, Sheng Zhong, Xu Zou

Abstract: Training Single-Image Super-Resolution (SISR) models using pixel-based regression losses can achieve high distortion metrics scores (e.g., PSNR and SSIM), but often results in blurry images due to insufficient recovery of high-frequency details. Conversely, using GAN or perceptual losses can produce sharp images with high perceptual metric scores (e.g., LPIPS), but may introduce artifacts and inco… ▽ More Training Single-Image Super-Resolution (SISR) models using pixel-based regression losses can achieve high distortion metrics scores (e.g., PSNR and SSIM), but often results in blurry images due to insufficient recovery of high-frequency details. Conversely, using GAN or perceptual losses can produce sharp images with high perceptual metric scores (e.g., LPIPS), but may introduce artifacts and incorrect textures. Balancing these two types of losses can help achieve a trade-off between distortion and perception, but the challenge lies in tuning the loss function weights. To address this issue, we propose a novel method that incorporates Multi-Objective Optimization (MOO) into the training process of SISR models to balance perceptual quality and distortion. We conceptualize the relationship between loss weights and image quality assessment (IQA) metrics as black-box objective functions to be optimized within our Multi-Objective Bayesian Optimization Super-Resolution (MOBOSR) framework. This approach automates the hyperparameter tuning process, reduces overall computational cost, and enables the use of numerous loss functions simultaneously. Extensive experiments demonstrate that MOBOSR outperforms state-of-the-art methods in terms of both perceptual quality and distortion, significantly advancing the perception-distortion Pareto frontier. Our work points towards a new direction for future research on balancing perceptual quality and fidelity in nearly all image restoration tasks. The source code and pretrained models are available at: https://github.com/ZhuKeven/MOBOSR. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.02795 [pdf, other]

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Authors: Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang

Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to unde… ▽ More Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to understand. The relationships between different methods have been under-explored, limiting the development of the preference alignment. In light of this, we break down the existing popular alignment strategies into different components and provide a unified framework to study the current alignment strategies, thereby establishing connections among them. In this survey, we decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm. This unified view offers an in-depth understanding of existing alignment algorithms and also opens up possibilities to synergize the strengths of different strategies. Furthermore, we present detailed working examples of prevalent existing algorithms to facilitate a comprehensive understanding for the readers. Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: Initial Commit, 21 pages

arXiv:2409.02579 [pdf, other]

doi 10.1145/3678884.3681866

Assembling the Puzzle: Exploring Collaboration and Data Sensemaking in Nursing Practices for Remote Patient Monitoring

Authors: Mihnea Calota, Janet Yi-Ching Huang, Lin-Lin Chen, Mathias Funk

Abstract: Remote patient monitoring (RPM) involves the remote collection and transmission of patient health data, serving as a notable application of data-driven healthcare. This technology facilitates clinical monitoring and decision-making, offering benefits like reduced healthcare costs and improved patient outcomes. However, RPM also introduces challenges common to data-driven healthcare, such as additi… ▽ More Remote patient monitoring (RPM) involves the remote collection and transmission of patient health data, serving as a notable application of data-driven healthcare. This technology facilitates clinical monitoring and decision-making, offering benefits like reduced healthcare costs and improved patient outcomes. However, RPM also introduces challenges common to data-driven healthcare, such as additional data work that can disrupt clinician's workflow. This study explores the daily practices, collaboration mechanisms, and sensemaking processes of nurses in RPM through field observations and interviews with six stakeholders. Preliminary results indicate that RPM's scale-up pushes clinicians toward asynchronous collaboration. Data sensemaking is crucial for this type of collaboration, but existing technologies often create friction rather than support. This work provides empirical insights into clinical workflow in nursing practice, especially RPM. We suggest recognizing data sensemaking as a distinct nursing practice within data work and recommend further investigation into its role in the workflow of nurses in RPM. △ Less

Submitted 5 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.02578 [pdf, other]

Searching for the massless dark photon in $c\to uγ'$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (648 additional authors not shown)

Abstract: In the effective field theory, the massless dark photon $γ'$ can only couple with the Standard Model particle through operators of dimension higher than four, thereby offering a high sensitivity to the new physics energy scale. Using $7.9~\rm{fb^{-1}}$ of $e^+e^-$ collision data collected at $\sqrt{s}=3.773$ GeV with the BESIII detector at the BEPCII collider, we measure the effective flavor-chang… ▽ More In the effective field theory, the massless dark photon $γ'$ can only couple with the Standard Model particle through operators of dimension higher than four, thereby offering a high sensitivity to the new physics energy scale. Using $7.9~\rm{fb^{-1}}$ of $e^+e^-$ collision data collected at $\sqrt{s}=3.773$ GeV with the BESIII detector at the BEPCII collider, we measure the effective flavor-changing neutral current coupling of $cuγ'$ in $D^0\toωγ'$ and $D^0\toγγ'$ processes to search for the massless dark photon. No significant signals are observed, and the upper limits at the 90% confidence level on the massless dark photon branching fraction are set to be $1.1\times10^{-5}$ and $2.0\times10^{-6}$ for $D^0\toωγ'$ and $D^0\toγγ'$, respectively. These results provide the most stringent constraint on the new physics energy scale associated with $cuγ'$ coupling in the world, with the new physics energy scale related parameter $|\mathbb{C}|^2+|\mathbb{C}_5|^2<8.2\times10^{-17}~\rm{GeV}^{-2}$ at the 90% confidence level, playing a unique role in the dark sector search with the charm sector. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 9 pages, 4 figures

arXiv:2409.01548 [pdf, other]

VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka

Authors: Li-Wei Chen, Hung-Shin Lee, Chen-Chi Chang

Abstract: This paper introduces VoxHakka, a text-to-speech (TTS) system designed for Taiwanese Hakka, a critically under-resourced language spoken in Taiwan. Leveraging the YourTTS framework, VoxHakka achieves high naturalness and accuracy and low real-time factor in speech synthesis while supporting six distinct Hakka dialects. This is achieved by training the model with dialect-specific data, allowing for… ▽ More This paper introduces VoxHakka, a text-to-speech (TTS) system designed for Taiwanese Hakka, a critically under-resourced language spoken in Taiwan. Leveraging the YourTTS framework, VoxHakka achieves high naturalness and accuracy and low real-time factor in speech synthesis while supporting six distinct Hakka dialects. This is achieved by training the model with dialect-specific data, allowing for the generation of speaker-aware Hakka speech. To address the scarcity of publicly available Hakka speech corpora, we employed a cost-effective approach utilizing a web scraping pipeline coupled with automatic speech recognition (ASR)-based data cleaning techniques. This process ensured the acquisition of a high-quality, multi-speaker, multi-dialect dataset suitable for TTS training. Subjective listening tests conducted using comparative mean opinion scores (CMOS) demonstrate that VoxHakka significantly outperforms existing publicly available Hakka TTS systems in terms of pronunciation accuracy, tone correctness, and overall naturalness. This work represents a significant advancement in Hakka language technology and provides a valuable resource for language preservation and revitalization efforts. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: Submitted to O-COCOSDA 2024

arXiv:2409.01545 [pdf, other]

Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation

Authors: Chien-Chun Wang, Li-Wei Chen, Hung-Shin Lee, Berlin Chen, Hsin-Min Wang

Abstract: Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain, leading to a mismatch between training and test conditions. This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs) with only limited tar… ▽ More Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain, leading to a mismatch between training and test conditions. This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs) with only limited target noisy speech data. Notably, our method employs a noise encoder to extract noise embeddings from target-domain data. These embeddings aptly guide the generator to synthesize utterances acoustically fitted to the target domain while authentically preserving the phonetic content of the input clean speech. Furthermore, we introduce the notion of dynamic stochastic perturbation, which can inject controlled perturbations into the noise embeddings during inference, thereby enabling the model to generalize well to unseen noise conditions. Experiments on the VoiceBank-DEMAND benchmark dataset demonstrate that our domain-adaptive SE method outperforms an existing strong baseline based on data simulation. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: Accepted to IEEE SLT 2024

arXiv:2409.01419 [pdf, other]

Study of $D^{+} \to K_{S}^{0}K^{*}(892)^{+}$ in $D^{+} \to K_{S}^{0} K_{S}^{0} π^{+}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using a data sample of $e^+e^-$ collisions corresponding to an integrated luminosity of 7.93 $\rm fb^{-1}$ collected with the BESIII detector at the center-of-mass energy 3.773~GeV, we perform the first amplitude analysis of the decay $D^{+} \to K_{S}^{0} K_{S}^{0} π^{+}$. The absolute branching fraction of $D^{+} \to K_{S}^{0}K_{S}^{0} π^{+}$ is measured to be… ▽ More Using a data sample of $e^+e^-$ collisions corresponding to an integrated luminosity of 7.93 $\rm fb^{-1}$ collected with the BESIII detector at the center-of-mass energy 3.773~GeV, we perform the first amplitude analysis of the decay $D^{+} \to K_{S}^{0} K_{S}^{0} π^{+}$. The absolute branching fraction of $D^{+} \to K_{S}^{0}K_{S}^{0} π^{+}$ is measured to be $(2.97 \pm 0.09_{\rm stat.} \pm 0.05_{\rm syst.})\times10^{-3}$. The dominant intermediate process is $D^{+} \to K_{S}^{0}K^{*}(892)^{+}$, whose branching fraction is determined to be $(8.72 \pm 0.28_{\rm stat.} \pm 0.15_{\rm syst.}) \times 10^{-3}$, including all the $K^*(892)^+$ decays. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2409.01199 [pdf, other]

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Authors: Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang, Shenghai Yuan, Xing Zhou, Xinghua Cheng, Li Yuan

Abstract: Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ign… ▽ More Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ignored in the temporal dimension. How to conduct temporal compression for videos in a VAE to obtain more concise latent representations while promising accurate reconstruction is seldom explored. To fill this gap, we propose an omni-dimension compression VAE, named OD-VAE, which can temporally and spatially compress videos. Although OD-VAE's more sufficient compression brings a great challenge to video reconstruction, it can still achieve high reconstructed accuracy by our fine design. To obtain a better trade-off between video reconstruction quality and compression speed, four variants of OD-VAE are introduced and analyzed. In addition, a novel tail initialization is designed to train OD-VAE more efficiently, and a novel inference strategy is proposed to enable OD-VAE to handle videos of arbitrary length with limited GPU memory. Comprehensive experiments on video reconstruction and LVDM-based video generation demonstrate the effectiveness and efficiency of our proposed methods. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: https://github.com/PKU-YuanGroup/Open-Sora-Plan

arXiv:2409.00922 [pdf, other]

doi 10.1145/3658644.3690231

ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model

Authors: Dawei Wang, Geng Zhou, Li Chen, Dan Li, Yukai Miao

Abstract: Vulnerabilities related to option combinations pose a significant challenge in software security testing due to their vast search space. Previous research primarily addressed this challenge through mutation or filtering techniques, which inefficiently treated all option combinations as having equal potential for vulnerabilities, thus wasting considerable time on non-vulnerable targets and resultin… ▽ More Vulnerabilities related to option combinations pose a significant challenge in software security testing due to their vast search space. Previous research primarily addressed this challenge through mutation or filtering techniques, which inefficiently treated all option combinations as having equal potential for vulnerabilities, thus wasting considerable time on non-vulnerable targets and resulting in low testing efficiency. In this paper, we utilize carefully designed prompt engineering to drive the large language model (LLM) to predict high-risk option combinations (i.e., more likely to contain vulnerabilities) and perform fuzz testing automatically without human intervention. We developed a tool called ProphetFuzz and evaluated it on a dataset comprising 52 programs collected from three related studies. The entire experiment consumed 10.44 CPU years. ProphetFuzz successfully predicted 1748 high-risk option combinations at an average cost of only \$8.69 per program. Results show that after 72 hours of fuzzing, ProphetFuzz discovered 364 unique vulnerabilities associated with 12.30\% of the predicted high-risk option combinations, which was 32.85\% higher than that found by state-of-the-art in the same timeframe. Additionally, using ProphetFuzz, we conducted persistent fuzzing on the latest versions of these programs, uncovering 140 vulnerabilities, with 93 confirmed by developers and 21 awarded CVE numbers. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: Preprint

arXiv:2409.00427 [pdf, other]

Measurement of Born cross sections of $e^+e^-\toΞ^0\barΞ^0$ and search for charmonium(-like) states at $\sqrt{s}$ = 3.51-4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (648 additional authors not shown)

Abstract: Using $e^+e^-$ collision data collected by the BESIII detector at BEPCII corresponding to an integrated luminosity of 30 $\rm fb^{-1}$, we measure Born cross sections and effective form factors for the process $e^+e^-\toΞ^0\barΞ^0$ at forty-five center-of-mass energies between 3.51 and 4.95 GeV. The dressed cross section is fitted, assuming a power-law function plus a charmonium(-like) state, i.e.… ▽ More Using $e^+e^-$ collision data collected by the BESIII detector at BEPCII corresponding to an integrated luminosity of 30 $\rm fb^{-1}$, we measure Born cross sections and effective form factors for the process $e^+e^-\toΞ^0\barΞ^0$ at forty-five center-of-mass energies between 3.51 and 4.95 GeV. The dressed cross section is fitted, assuming a power-law function plus a charmonium(-like) state, i.e., $ψ(3770)$, $ψ(4040)$, $ψ(4160)$, $ψ(4230)$, $ψ(4360)$, $ψ(4415)$ or $ψ(4660)$. No significant charmonium(-like) state decaying into $Ξ^0\barΞ^0$ is observed. Upper limits at the 90% confidence level on the product of the branching fraction and the electronic partial width are provided for each decay. In addition, ratios of the Born cross sections and the effective form factors for $e^+e^-\toΞ^0\barΞ^0$ and $e^+e^-\toΞ^-\barΞ^+$ are also presented to test isospin symmetry and the vector meson dominance model. △ Less

Submitted 31 August, 2024; originally announced September 2024.

Comments: 22 pages, 2 tables, 4 figures

arXiv:2409.00425 [pdf, other]

Phase behaviors and dynamics of active particle systems in double-well potential

Authors: Lu Chen, Baopi Liu, Ning Liu

Abstract: In this paper, we investigate the phase behaviors and dynamics of self-propelled particles with active reorientation in double-well potential. We observe the self-propelled particles exhibit flocking and clustering in an asymmetric potential trap. By MD simulations, we obtain a phase diagram of flocking with active reorientation and potential asymmetry as parameters. We compare the responses of in… ▽ More In this paper, we investigate the phase behaviors and dynamics of self-propelled particles with active reorientation in double-well potential. We observe the self-propelled particles exhibit flocking and clustering in an asymmetric potential trap. By MD simulations, we obtain a phase diagram of flocking with active reorientation and potential asymmetry as parameters. We compare the responses of inactive and active particles to the potential. It shows that active reorientation of particles amplifies the degree of aggregation on one side in the asymmetric potential well. Furthermore, we calculate the mean squared displacement and identify distinct diffusion regimes. These results highlight active particles with active reorientation exhibit greater sensitivity in double-well potentials. △ Less

Submitted 31 August, 2024; originally announced September 2024.

Comments: 7 pages, 8 figures

arXiv:2409.00247 [pdf, other]

Earth's Alfvén Wings: Unveiling Dynamic Variations of Field-line Topologies with Electron Distributions

Authors: Harsha Gurram, Jason R. Shuster, Li-Jen Chen, Hiroshi Hasegawa, Richard E. Denton, Brandon L. Burkholder, Jason Beedle, Daniel J. Gershman, James Burch

Abstract: The magnetic cloud (MC) of the Coronal Mass Ejection on April 24, 2023, contains sub-Alfvénic solar wind, transforming Earth's magnetosphere from conventional bow-shock magnetotail configuration to Alfvén wings. Utilizing measurements from the Magnetosphere Multiscale (MMS) mission, we present for the first time electron distribution signatures as the spacecraft traverses through various magnetic… ▽ More The magnetic cloud (MC) of the Coronal Mass Ejection on April 24, 2023, contains sub-Alfvénic solar wind, transforming Earth's magnetosphere from conventional bow-shock magnetotail configuration to Alfvén wings. Utilizing measurements from the Magnetosphere Multiscale (MMS) mission, we present for the first time electron distribution signatures as the spacecraft traverses through various magnetic topologies during this transformation. Specifically, we characterize electrons inside the sub-Alfvénic MC, on the dawn-dusk wing field lines and on the closed field lines. The signatures include strahl electrons in MC regions and energetic keV electrons streaming along the dawn and dusk wing field lines. We demonstrate the distribution signatures of dual wing reconnection, defined as reconnection between dawn-dusk Alfvén wing field lines and the IMF. These signatures include four electron populations comprised of partially-depleted MC electrons and bi-directional energetic electrons with variations in energy and pitch-angle. The distributions reveal evidence of bursty magnetic reconnection under northward IMF. △ Less

Submitted 30 August, 2024; originally announced September 2024.

Comments: 11pages 4 figures

arXiv:2408.17224 [pdf, other]

Hadronic cross section measurements with the DAMPE space mission using 20GeV-10TeV cosmic-ray protons and $^4$He

Authors: F. Alemanno, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, I. Cagnoli, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, P. Coppin, M. Y. Cui, T. S. Cui, Y. X. Cui, H. T. Dai, A. De Benedittis, I. De Mitri, F. de Palma, A. Di Giovanni, Q. Ding, T. K. Dong , et al. (126 additional authors not shown)

Abstract: Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based exp… ▽ More Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based experiments. We present an energy-dependent measurement of the inelastic cross section of protons and helium-4 nuclei (alpha particles) on a Bi$_4$Ge$_3$O$_{12}$ target, using 88 months of data collected by the DAMPE space mission. The kinetic energy range per nucleon of the measurement points ranges from 18 GeV to 9 TeV for protons, and from 5 GeV/n to 3 TeV/n for helium-4 nuclei. Our results lead to a significant improvement of the CR flux normalisation. In the case of helium-4, these results correspond to the first cross section measurements on a heavy target material at energies above 10 GeV/n. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 17 pages, submitted to PRD

arXiv:2408.17071 [pdf, other]

Search for $h_c \to π^+π^-J/ψ$ via $ψ(3686)\to π^0h_c$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (653 additional authors not shown)

Abstract: Using $(2712.4 \pm 14.3) \times 10^6~ψ$(3686) events collected with the BESIII detector operating at the BEPCII collider, we search for the hadronic transition $h_c \to π^+π^-J/ψ$ via $ψ(3686)\to π^0 h_c$. No significant signal is observed. We set the most stringent upper limits to date on the branching fractions $\mathcal{B}(ψ(3686)\to π^0 h_c)\times\mathcal{B}(h_c\toπ^+π^-J/ψ)$ and… ▽ More Using $(2712.4 \pm 14.3) \times 10^6~ψ$(3686) events collected with the BESIII detector operating at the BEPCII collider, we search for the hadronic transition $h_c \to π^+π^-J/ψ$ via $ψ(3686)\to π^0 h_c$. No significant signal is observed. We set the most stringent upper limits to date on the branching fractions $\mathcal{B}(ψ(3686)\to π^0 h_c)\times\mathcal{B}(h_c\toπ^+π^-J/ψ)$ and $\mathcal{B}(h_c \to π^+π^-J/ψ)$ at the 90$\%$ confidence level, which are determined to be $6.7\times 10^{-7}$ and $9.4 \times10^{-4}$, respectively. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2408.16997 [pdf, other]

doi 10.1103/PhysRevLett.133.090402

Experimental Verification of Demon-Involved Fluctuation Theorems

Authors: L. -L. Yan, J. -T. Bu, Q. Zeng, K. Zhang, K. -F. Cui, F. Zhou, S. -L. Su, L. Chen, J. Wang, Gang Chen, M. Feng

Abstract: The limit of energy saving in the control of small systems has recently attracted much interest due to the concept refinement of the Maxwell demon. Inspired by a newly proposed set of fluctuation theorems, we report the first experimental verification of these equalities and inequalities in a ultracold 40Ca ion system, confirming the intrinsic nonequilibrium in the system due to involvement of the… ▽ More The limit of energy saving in the control of small systems has recently attracted much interest due to the concept refinement of the Maxwell demon. Inspired by a newly proposed set of fluctuation theorems, we report the first experimental verification of these equalities and inequalities in a ultracold 40Ca ion system, confirming the intrinsic nonequilibrium in the system due to involvement of the demon. Based on elaborately designed demon-involved control protocols, such as the Szilard engine protocol, we provide experimentally quantitative evidence of the dissipative information, and observe tighter bounds of both the extracted work and the demon's efficacy than the limits predicted by the Sagawa-Ueda theorem. Our results substantiate a close connection between the physical nature of information and nonequilibrium processes at the microscale, which help further understanding the thermodynamic characteristics of information and the optimal design of nanoscale and smaller systems. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Journal ref: Physical Review Letters 133, 090402 (2024)

arXiv:2408.16756 [pdf, other]

How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models

Authors: Jiyue Jiang, Liheng Chen, Pengan Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

Abstract: The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong… ▽ More The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong-Macau Greater Bay Area, and in substantial Cantonese-speaking populations in places like Singapore and North America. Despite its wide use, Cantonese has scant representation in NLP research, especially compared to other languages from similarly developed regions. To bridge these gaps, we outline current Cantonese NLP methods and introduce new benchmarks designed to evaluate LLM performance in factual generation, mathematical logic, complex reasoning, and general knowledge in Cantonese, which aim to advance open-source Cantonese LLM technology. We also propose future research directions and recommended models to enhance Cantonese LLM development. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.16654 [pdf, other]

Measurement of the Decay $Ξ^{0}\toΛγ$ with Entangled $Ξ^{0}\barΞ^{0}$ Pairs

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: In this Letter, a systematic study of the weak radiative hyperon decay $Ξ^{0}\toΛγ$ at an electron-positron collider using entangled $Ξ^{0}\barΞ^{0}$ pair events is presented. The absolute branching fraction for this decay has been measured for the first time, and is $\left(1.347 \pm 0.066_{\mathrm stat.}\pm0.054_{\mathrm syst.}\right)\times 10^{-3}$. The decay asymmetry parameter, which character… ▽ More In this Letter, a systematic study of the weak radiative hyperon decay $Ξ^{0}\toΛγ$ at an electron-positron collider using entangled $Ξ^{0}\barΞ^{0}$ pair events is presented. The absolute branching fraction for this decay has been measured for the first time, and is $\left(1.347 \pm 0.066_{\mathrm stat.}\pm0.054_{\mathrm syst.}\right)\times 10^{-3}$. The decay asymmetry parameter, which characterizes the effect of parity violation in the decay, is determined to be $-0.741 \pm 0.062_{\mathrm stat.}\pm 0.019_{\mathrm syst.}$. The obtained results are consistent with the world average values within the uncertainties, offering valuable insights into the underlying mechanism governing the weak radiative hyperon decays. The charge conjugation parity ($CP$) symmetries of branching fraction and decay asymmetry parameter in the decay are also studied. No statistically significant violation of charge conjugation parity symmetry is observed. △ Less

Submitted 29 August, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

Comments: 10 pages, 3 figures

arXiv:2408.16498 [pdf, other]

A Survey on Evaluating Large Language Models in Code Generation Tasks

Authors: Liguo Chen, Qi Guo, Hongrui Jia, Zhengran Zeng, Xin Wang, Yijiang Xu, Jian Wu, Yidong Wang, Qing Gao, Jindong Wang, Wei Ye, Shikun Zhang

Abstract: This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development, LLMs have demonstrated significant potential in the field of code generation. The paper begins by reviewing the historical development of LLMs and their applicatio… ▽ More This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development, LLMs have demonstrated significant potential in the field of code generation. The paper begins by reviewing the historical development of LLMs and their applications in code generation. Next, it details various methods and metrics for assessing the code generation capabilities of LLMs, including code correctness, efficiency, readability, and evaluation methods based on expert review and user experience. The paper also evaluates the widely used benchmark datasets, identifying their limitations and proposing directions for future improvements. Specifically, the paper analyzes the performance of code generation models across different tasks by combining multiple evaluation metrics, such as code compilation/interpretation success rates, unit test pass rates, and performance and efficiency metrics, to comprehensively assess the practical application of LLMs in code generation. Finally, the paper discusses the challenges faced in evaluating LLMs in code generation, particularly how to ensure the comprehensiveness and accuracy of evaluation methods and how to adapt to the evolving practices of software development. These analyses and discussions provide valuable insights for further optimizing and improving the application of LLMs in code generation tasks. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.16420 [pdf, other]

Time-Optimized Trajectory Planning for Non-Prehensile Object Transportation in 3D

Authors: Lingyun Chen, Haoyu Yu, Abdeldjallil Naceri, Abdalla Swikir, Sami Haddadin

Abstract: Non-prehensile object transportation offers a way to enhance robotic performance in object manipulation tasks, especially with unstable objects. Effective trajectory planning requires simultaneous consideration of robot motion constraints and object stability. Here, we introduce a physical model for object stability and propose a novel trajectory planning approach for non-prehensile transportation… ▽ More Non-prehensile object transportation offers a way to enhance robotic performance in object manipulation tasks, especially with unstable objects. Effective trajectory planning requires simultaneous consideration of robot motion constraints and object stability. Here, we introduce a physical model for object stability and propose a novel trajectory planning approach for non-prehensile transportation along arbitrary straight lines in 3D space. Validation with a 7-DoF Franka Panda robot confirms improved transportation speed via tray rotation integration while ensuring object stability and robot motion constraints. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: Accepted to the European Robotic Forum (ERF) 2024

arXiv:2408.16366 [pdf, other]

A small footprint travelling-wave parametric amplifier with a high Signal-to-Noise Ratio improvement in a wide band

Authors: Hampus Renberg Nilsson, Liangyu Chen, Giovanna Tancredi, Robert Rehammar, Daryoush Shiri, Filip Nilsson, Amr Osman, Vitaly Shumeiko, Per Delsing

Abstract: We characterise a small footprint travelling-wave parametric amplifier (TWPA). The TWPA is built with magnetically flux-tunable superconducting nonlinear asymmetric inductive elements (SNAILs) and parallel-plate capacitors. It implements three-wave mixing (3WM) with resonant phase matching (RPM), a small cutoff frequency for high gain per unitcell and impedance matching networks for large bandwidt… ▽ More We characterise a small footprint travelling-wave parametric amplifier (TWPA). The TWPA is built with magnetically flux-tunable superconducting nonlinear asymmetric inductive elements (SNAILs) and parallel-plate capacitors. It implements three-wave mixing (3WM) with resonant phase matching (RPM), a small cutoff frequency for high gain per unitcell and impedance matching networks for large bandwidth impedance matching. The device has 200 unitcells and a physical footprint of only 1.1 mm^2, yet demonstrates an average parametric gain of 19 dB over a 3 GHz bandwidth, an average effective signal-to-noise ratio improvement of 10 dB and a clear speedup of qubit readout time. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 9 pages + 2 appendix pages, 3 figures + 2 appendix figures

arXiv:2408.16279 [pdf, ps, other]

Model-independent determination of the strong-phase difference between $D^0$ and $\bar{D}^0 \to π^+π^-π^+π^-$ decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (647 additional authors not shown)

Abstract: Measurements of the strong-phase difference between $D^0$ and $\bar{D}^0\toπ^+π^-π^+π^-$ are performed in bins of phase space. The study exploits a sample of quantum-correlated $D\bar{D}$ mesons collected by the BESIII experiment in $e^+e^-$ collisions at a center-of-mass energy of 3.773~GeV, corresponding to an integrated luminosity of 2.93~fb$^{-1}$. Here, $D$ denotes a neutral charm meson in a… ▽ More Measurements of the strong-phase difference between $D^0$ and $\bar{D}^0\toπ^+π^-π^+π^-$ are performed in bins of phase space. The study exploits a sample of quantum-correlated $D\bar{D}$ mesons collected by the BESIII experiment in $e^+e^-$ collisions at a center-of-mass energy of 3.773~GeV, corresponding to an integrated luminosity of 2.93~fb$^{-1}$. Here, $D$ denotes a neutral charm meson in a superposition of flavor eigenstates. The reported results are valuable for measurements of the $C\!P$-violating phase $γ$ (also denoted $φ_3$) in $B^\pm \to DK^\pm$, $D \to π^+π^-π^+π^-$ decays, and the binning schemes are designed to provide good statistical sensitivity to this parameter. The expected uncertainty on $γ$ arising from the precision of the strong-phase measurements, when applied to very large samples of $B$-meson decays, is around $1.5^\circ$ or $2^\circ$, depending on the binning scheme. The binned strong-phase parameters are combined to give a value of $F_+^{4π} = 0.746 \pm 0.010 \pm 0.004$ for the $C\!P$-even fraction of $D^0 \to π^+π^-π^+π^-$ decays, which is around 30\% more precise than the previous best measurement of this quantity. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.16266 [pdf, other]

Improving Diffusion-based Data Augmentation with Inversion Spherical Interpolation

Authors: Yanghao Wang, Long Chen

Abstract: Data Augmentation (DA), \ie, synthesizing faithful and diverse samples to expand the original training set, is a prevalent and effective strategy to improve various visual recognition tasks. With the powerful image generation ability, diffusion-based DA has shown strong performance gains on different benchmarks. In this paper, we analyze today's diffusion-based DA methods, and argue that they cann… ▽ More Data Augmentation (DA), \ie, synthesizing faithful and diverse samples to expand the original training set, is a prevalent and effective strategy to improve various visual recognition tasks. With the powerful image generation ability, diffusion-based DA has shown strong performance gains on different benchmarks. In this paper, we analyze today's diffusion-based DA methods, and argue that they cannot take account of both faithfulness and diversity, which are two critical keys for generating high-quality samples and boosting final classification performance. To this end, we propose a novel Diffusion-based Inversion Interpolation DA method: Diff-II. Specifically, Diff-II consists of three main steps: 1) Category concepts learning: Learning concept embeddings for each category. 2) Inversion interpolation: Calculating the inversion for each image, and conducting spherical interpolation for two randomly sampled inversions from the same category. 3) Two-stage denoising: Using different prompts to generate synthesized images in a coarse-to-fine manner. Extensive experiments on multiple image classification tasks (\eg, few-shot, long-tailed, and out-of-distribution classification) have demonstrated its effectiveness over state-of-the-art diffusion-based DA methods. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.15980 [pdf, other]

In-Context Imitation Learning via Next-Token Prediction

Authors: Letian Fu, Huang Huang, Gaurav Datta, Lawrence Yunliang Chen, William Chung-Ho Panitch, Fangchen Liu, Hui Li, Ken Goldberg

Abstract: We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor traj… ▽ More We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor trajectories without relying on any linguistic data or reward function. This formulation enables flexible and training-free execution of new tasks at test time, achieved by prompting the model with sensorimotor trajectories of the new task composing of image observations, actions and states tuples, collected through human teleoperation. Experiments with a Franka Emika robot demonstrate that the ICRT can adapt to new tasks specified by prompts, even in environment configurations that differ from both the prompt and the training data. In a multitask environment setup, ICRT significantly outperforms current state-of-the-art next-token prediction models in robotics on generalizing to unseen tasks. Code, checkpoints and data are available on https://icrt.dev/ △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15881 [pdf, other]

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Authors: Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang

Abstract: We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, s… ▽ More We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, striking a balance between computational efficiency and model expressiveness. Second, we propose a progressive knowledge transfer strategy to ensure comprehensive knowledge migration. This strategy begins with mimic distillation, where we minimize the Kullback-Leibler (KL) divergence between output distributions to enable the student model to emulate the teacher network's understanding. Following this, we introduce preference distillation via Direct Preference Optimization (DPO), where the key lies in treating l-MLLM as the reference model. During this phase, the s-MLLM's ability to discriminate between superior and inferior examples is significantly enhanced beyond l-MLLM, leading to a better student that surpasses its teacher, particularly in hallucination benchmarks. Extensive experiments demonstrate that LLaVA-MoD outperforms existing models across various multimodal benchmarks while maintaining a minimal number of activated parameters and low computational costs. Remarkably, LLaVA-MoD, with only 2B activated parameters, surpasses Qwen-VL-Chat-7B by an average of 8.8% across benchmarks, using merely 0.3% of the training data and 23% trainable parameters. These results underscore LLaVA-MoD's ability to effectively distill comprehensive knowledge from its teacher model, paving the way for the development of more efficient MLLMs. The code will be available on: https://github.com/shufangxun/LLaVA-MoD. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15782 [pdf, other]

Indirect nonlinear interaction between toroidal Alfvén eigenmode and ion temperature gradient mode mediated by zonal structures

Authors: Qian Fang, Guangyu Wei, Ningfei Chen, Liu Chen, Fulvio Zonca, Zhiyong Qiu

Abstract: The indirect nonlinear interactions between toroidal Alfvén eigenmode (TAE) and ion temperature gradient mode (ITG) are investigated using nonlinear gyrokinetic theory and ballooning mode formalism. More specifically, the local nonlinear ITG mode equation is derived adopting the fluid-ion approximation, with the contributions of zonal field structure and phase space zonal structure beat-driven by… ▽ More The indirect nonlinear interactions between toroidal Alfvén eigenmode (TAE) and ion temperature gradient mode (ITG) are investigated using nonlinear gyrokinetic theory and ballooning mode formalism. More specifically, the local nonlinear ITG mode equation is derived adopting the fluid-ion approximation, with the contributions of zonal field structure and phase space zonal structure beat-driven by finite amplitude TAE accounted for on the same footing. The obtained nonlinear ITG mode equation is solved both analytically and numerically, and it is found that, the zonal structure beat-driven by TAE has only weakly destabilizing effects on ITG, contrary to usual speculations and existing numerical results. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15770 [pdf]

Unprecedented Enhancement of Piezoelectricity in Wurtzite Nitride Semiconductors via Thermal Annealing

Authors: Shubham Mondal, Md Mehedi Hasan Tanim, Garrett Baucom, Shaurya S. Dabas, Jinghan Gao, Venkateswarlu Gaddam, Jiangnan Liu, Aiden Ross, Long-Qing Chen, Honggyu Kim, Roozbeh Tabrizian, Zetian Mi

Abstract: The incorporation of rare-earth elements in wurtzite nitride semiconductors, e.g., scandium alloyed aluminum nitride (ScAlN), promises dramatically enhanced piezoelectric responses, critical to a broad range of acoustic, electronic, photonic, and quantum devices and applications. Experimentally, however, the measured piezoelectric responses of nitride semiconductors are far below what theory has p… ▽ More The incorporation of rare-earth elements in wurtzite nitride semiconductors, e.g., scandium alloyed aluminum nitride (ScAlN), promises dramatically enhanced piezoelectric responses, critical to a broad range of acoustic, electronic, photonic, and quantum devices and applications. Experimentally, however, the measured piezoelectric responses of nitride semiconductors are far below what theory has predicted. Here, we show that the use of a simple, scalable, post-growth thermal annealing process can dramatically boost the piezoelectric response of ScAlN thin films. We achieve a remarkable 3.5-fold increase in the piezoelectric modulus, d33 for 30% Sc content ScAlN, from 12.3 pC/N in the as-grown state to 45.5 pC/N, which is eight times larger than that of AlN. The enhancement in piezoelectricity has been unambiguously confirmed by three separate measurement techniques. Such a dramatic enhancement of d33 has been shown to impact the effective electromechanical coupling coefficient kt2 : increasing it from 13.8% to 76.2%, which matches the highest reported values in millimeter thick lithium niobate films but is achieved in a 100 nm ScAlN with a 10,000 fold reduction in thickness, thus promising extreme frequency scaling opportunities for bulk acoustic wave resonators for beyond 5G applications. By utilizing a range of material characterization techniques, we have elucidated the underlying mechanisms for the dramatically enhanced piezoelectric responses, including improved structural quality at the macroscopic scale, more homogeneous and ordered distribution of domain structures at the mesoscopic scale, and the reduction of lattice parameter ratio (c/a) for the wurtzite crystal structure at the atomic scale. Overall, the findings present a simple yet highly effective pathway that can be extended to other material families to further enhance their piezo responses. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15657 [pdf, other]

TeFF: Tracking-enhanced Forgetting-free Few-shot 3D LiDAR Semantic Segmentation

Authors: Junbao Zhou, Jilin Mei, Pengze Wu, Liang Chen, Fangzhou Zhao, Xijun Zhao, Yu Hu

Abstract: In autonomous driving, 3D LiDAR plays a crucial role in understanding the vehicle's surroundings. However, the newly emerged, unannotated objects presents few-shot learning problem for semantic segmentation. This paper addresses the limitations of current few-shot semantic segmentation by exploiting the temporal continuity of LiDAR data. Employing a tracking model to generate pseudo-ground-truths… ▽ More In autonomous driving, 3D LiDAR plays a crucial role in understanding the vehicle's surroundings. However, the newly emerged, unannotated objects presents few-shot learning problem for semantic segmentation. This paper addresses the limitations of current few-shot semantic segmentation by exploiting the temporal continuity of LiDAR data. Employing a tracking model to generate pseudo-ground-truths from a sequence of LiDAR frames, our method significantly augments the dataset, enhancing the model's ability to learn on novel classes. However, this approach introduces a data imbalance biased to novel data that presents a new challenge of catastrophic forgetting. To mitigate this, we incorporate LoRA, a technique that reduces the number of trainable parameters, thereby preserving the model's performance on base classes while improving its adaptability to novel classes. This work represents a significant step forward in few-shot 3D LiDAR semantic segmentation for autonomous driving. Our code is available at https://github.com/junbao-zhou/Track-no-forgetting. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15494 [pdf, other]

Tracking the Electron Density Changes in Excited States -- A Computational Study on Pyrazine

Authors: Sebastian V. Pios, Jiaji Zhang, Maxim F. Gelin, Hong-Guang Duan, Lipeng Chen

Abstract: The development of X-ray free-electron lasers (XFELs) has enabled ultrafast X-ray diffraction (XRD) experiments, which are capable of resolving electronic/vibrational transitions and structural changes in molecules, or capturing molecular movies. While time-resolved XRD has received increasing attention, the extraction of information content from signals is challenging and requires theoretical sup… ▽ More The development of X-ray free-electron lasers (XFELs) has enabled ultrafast X-ray diffraction (XRD) experiments, which are capable of resolving electronic/vibrational transitions and structural changes in molecules, or capturing molecular movies. While time-resolved XRD has received increasing attention, the extraction of information content from signals is challenging and requires theoretical support. In this work, we combined X-ray scattering theory and trajectory surface hopping approach to resolve dynamical changes in the electronic structure of photo-excited molecules by studying time evolution of electron density changes between electronic excited states and ground state. Using pyrazine molecule as an example, we show that key features of reaction pathways can be identified, enabling the capture of structural changes associated with electronic transitions for a photo-excited molecule. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.14902 [pdf, other]

In-Lab High Resolution Mid-infrared Up-conversion Stellar Interferometer Based on Synthetic Long Base-Line

Authors: Zhao-Qi-Zhi Han, Zheng Ge, Wen-Tao Luo, Yi-Fu Cai, Xiao-Hua Wang, Li Chen, Wu-Zhen Li, Zhi-Yuan Zhou, Bao-Sen Shi

Abstract: Detecting mid-infrared (MIR) radiation has significant astronomical applications, although limited by unsatisfactory MIR detectors. Here we reported on the realization of a MIR up-conversion interferometer based on synthetic long base-line (SLBL) in the laboratory. The experimental system consisted of an interferometer and subsequent up-conversion detection part of mid-infrared signal, which strea… ▽ More Detecting mid-infrared (MIR) radiation has significant astronomical applications, although limited by unsatisfactory MIR detectors. Here we reported on the realization of a MIR up-conversion interferometer based on synthetic long base-line (SLBL) in the laboratory. The experimental system consisted of an interferometer and subsequent up-conversion detection part of mid-infrared signal, which streamlined the structure and enhanced the reliability of the system. By using a tungsten filament lamp as an imitated star, we not only achieved the single target angle resolution of 1.10 times 10^(-4) rad, but also obtained the field angle resolution of 3.0 times 10^(-4) rad of double star targets. The angular resolution is in inverse proportion to the length of baseline. The maximum length of simulated baseline in the laboratory is about 3cm. In a Keck Interferometer (KI) liked program, the base line can reach up to 85m leading to a corresponding angular resolution of 3.0 times 10^(-9) rad (about 1.8mas). The study will offer potential benefits in extending the usage of mid-infrared light in astronomical exploration. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: 11 pages, 4 figures. Accepted by Physics Review D

arXiv:2408.14438 [pdf, other]

Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study

Authors: Liuchang Xu, Shuo Zhao, Qingming Lin, Luyao Chen, Qianqian Luo, Sensen Wu, Xinyue Ye, Hailin Feng, Zhenhong Du

Abstract: The advent of large language models such as ChatGPT, Gemini, and others has underscored the importance of evaluating their diverse capabilities, ranging from natural language understanding to code generation. However, their performance on spatial tasks has not been comprehensively assessed. This study addresses this gap by introducing a novel multi-task spatial evaluation dataset, designed to syst… ▽ More The advent of large language models such as ChatGPT, Gemini, and others has underscored the importance of evaluating their diverse capabilities, ranging from natural language understanding to code generation. However, their performance on spatial tasks has not been comprehensively assessed. This study addresses this gap by introducing a novel multi-task spatial evaluation dataset, designed to systematically explore and compare the performance of several advanced models on spatial tasks. The dataset encompasses twelve distinct task types, including spatial understanding and path planning, each with verified, accurate answers. We evaluated multiple models, including OpenAI's gpt-3.5-turbo, gpt-4o, and ZhipuAI's glm-4, through a two-phase testing approach. Initially, we conducted zero-shot testing, followed by categorizing the dataset by difficulty and performing prompt tuning tests. Results indicate that gpt-4o achieved the highest overall accuracy in the first phase, with an average of 71.3%. Although moonshot-v1-8k slightly underperformed overall, it surpassed gpt-4o in place name recognition tasks. The study also highlights the impact of prompt strategies on model performance in specific tasks. For example, the Chain-of-Thought (COT) strategy increased gpt-4o's accuracy in path planning from 12.4% to 87.5%, while a one-shot strategy enhanced moonshot-v1-8k's accuracy in mapping tasks from 10.1% to 76.3%. △ Less

Submitted 2 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.14211 [pdf, other]

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Authors: Xu He, Xiaoyu Li, Di Kang, Jiangnan Ye, Chaopeng Zhang, Liyang Chen, Xiangjun Gao, Han Zhang, Zhiyong Wu, Haolin Zhuang

Abstract: Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image. As its core, we leverage a pre-trained 2D d… ▽ More Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image. As its core, we leverage a pre-trained 2D diffusion model as the generative prior for generalizability, with the parametric SMPL-X model as the 3D body prior to promote 3D awareness. To tackle the critical challenge of maintaining consistency while achieving dense multi-view generation for improved 3D human reconstruction, we first introduce hybrid multi-view attention to facilitate both efficient and thorough information interchange across different views. Additionally, we present a geometry-aware dual branch to perform concurrent generation in both RGB and normal domains, further enhancing consistency via geometry cues. Last but not least, to address ill-shaped issues arising from inaccurate SMPL-X estimation that conflicts with the reference image, we propose a novel iterative refinement strategy, which progressively optimizes SMPL-X accuracy while enhancing the quality and consistency of the generated multi-views. Extensive experimental results demonstrate that our method significantly outperforms existing approaches in both novel view synthesis and subsequent 3D human reconstruction tasks. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: Project Page: https://thuhcsi.github.io/MagicMan

arXiv:2408.14173 [pdf, other]

BackFlip: The Impact of Local and Global Data Augmentations on Artistic Image Aesthetic Assessment

Authors: Ombretta Strafforello, Gonzalo Muradas Odriozola, Fatemeh Behrad, Li-Wei Chen, Anne-Sofie Maerten, Derya Soydaner, Johan Wagemans

Abstract: Assessing the aesthetic quality of artistic images presents unique challenges due to the subjective nature of aesthetics and the complex visual characteristics inherent to artworks. Basic data augmentation techniques commonly applied to natural images in computer vision may not be suitable for art images in aesthetic evaluation tasks, as they can change the composition of the art images. In this p… ▽ More Assessing the aesthetic quality of artistic images presents unique challenges due to the subjective nature of aesthetics and the complex visual characteristics inherent to artworks. Basic data augmentation techniques commonly applied to natural images in computer vision may not be suitable for art images in aesthetic evaluation tasks, as they can change the composition of the art images. In this paper, we explore the impact of local and global data augmentation techniques on artistic image aesthetic assessment (IAA). We introduce BackFlip, a local data augmentation technique designed specifically for artistic IAA. We evaluate the performance of BackFlip across three artistic image datasets and four neural network architectures, comparing it with the commonly used data augmentation techniques. Then, we analyze the effects of components within the BackFlip pipeline through an ablation study. Our findings demonstrate that local augmentations, such as BackFlip, tend to outperform global augmentations on artistic IAA in most cases, probably because they do not perturb the composition of the art images. These results emphasize the importance of considering both local and global augmentations in future computational aesthetics research. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: Published at the VISART VII workshop at ECCV 2024. Ombretta Strafforello, Gonzalo Muradas Odriozola, Fatemeh Behrad, Li-Wei Chen, Anne-Sofie Maerten and Derya Soydaner contributed equally to this work

arXiv:2408.13044 [pdf, other]

Identification and validation of the dynamic model of a tendon-driven anthropomorphic finger

Authors: Junnan Li, Lingyun Chen, Johannes Ringwald, Edmundo Pozo Fortunic, Amartya Ganguly, Sami Haddadin

Abstract: This study addresses the absence of an identification framework to quantify a comprehensive dynamic model of human and anthropomorphic tendon-driven fingers, which is necessary to investigate the physiological properties of human fingers and improve the control of robotic hands. First, a generalized dynamic model was formulated, which takes into account the inherent properties of such a mechanical… ▽ More This study addresses the absence of an identification framework to quantify a comprehensive dynamic model of human and anthropomorphic tendon-driven fingers, which is necessary to investigate the physiological properties of human fingers and improve the control of robotic hands. First, a generalized dynamic model was formulated, which takes into account the inherent properties of such a mechanical system. This includes rigid-body dynamics, coupling matrix, joint viscoelasticity, and tendon friction. Then, we propose a methodology comprising a series of experiments, for step-wise identification and validation of this dynamic model. Moreover, an experimental setup was designed and constructed that features actuation modules and peripheral sensors to facilitate the identification process. To verify the proposed methodology, a 3D-printed robotic finger based on the index finger design of the Dexmart hand was developed, and the proposed experiments were executed to identify and validate its dynamic model. This study could be extended to explore the identification of cadaver hands, aiming for a consistent dataset from a single cadaver specimen to improve the development of musculoskeletal hand models. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: 8 pages, 9 figures

arXiv:2408.12981 [pdf, other]

QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval

Authors: Chenghua Gao, Min Li, Jianshuo Liu, Junxing Ren, Lin Chen, Haoyu Liu, Bo Meng, Jitao Fu, Wenwen Su

Abstract: Video Moment Retrieval (VMR) aims to retrieve relevant moments of an untrimmed video corresponding to the query. While cross-modal interaction approaches have shown progress in filtering out query-irrelevant information in videos, they assume the precise alignment between the query semantics and the corresponding video moments, potentially overlooking the misunderstanding of the natural language s… ▽ More Video Moment Retrieval (VMR) aims to retrieve relevant moments of an untrimmed video corresponding to the query. While cross-modal interaction approaches have shown progress in filtering out query-irrelevant information in videos, they assume the precise alignment between the query semantics and the corresponding video moments, potentially overlooking the misunderstanding of the natural language semantics. To address this challenge, we propose a novel model called \textit{QD-VMR}, a query debiasing model with enhanced contextual understanding. Firstly, we leverage a Global Partial Aligner module via video clip and query features alignment and video-query contrastive learning to enhance the cross-modal understanding capabilities of the model. Subsequently, we employ a Query Debiasing Module to obtain debiased query features efficiently, and a Visual Enhancement module to refine the video features related to the query. Finally, we adopt the DETR structure to predict the possible target video moments. Through extensive evaluations of three benchmark datasets, QD-VMR achieves state-of-the-art performance, proving its potential to improve the accuracy of VMR. Further analytical experiments demonstrate the effectiveness of our proposed module. Our code will be released to facilitate future research. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: 9 pages, 4 figures, 4 tables

arXiv:2408.12879 [pdf, other]

Frequency-aware Feature Fusion for Dense Image Prediction

Authors: Linwei Chen, Ying Fu, Lin Gu, Chenggang Yan, Tatsuya Harada, Gao Huang

Abstract: Dense image prediction tasks demand features with strong category information and precise spatial boundary details at high resolution. To achieve this, modern hierarchical models often utilize feature fusion, directly adding upsampled coarse features from deep layers and high-resolution features from lower levels. In this paper, we observe rapid variations in fused feature values within objects, r… ▽ More Dense image prediction tasks demand features with strong category information and precise spatial boundary details at high resolution. To achieve this, modern hierarchical models often utilize feature fusion, directly adding upsampled coarse features from deep layers and high-resolution features from lower levels. In this paper, we observe rapid variations in fused feature values within objects, resulting in intra-category inconsistency due to disturbed high-frequency features. Additionally, blurred boundaries in fused features lack accurate high frequency, leading to boundary displacement. Building upon these observations, we propose Frequency-Aware Feature Fusion (FreqFusion), integrating an Adaptive Low-Pass Filter (ALPF) generator, an offset generator, and an Adaptive High-Pass Filter (AHPF) generator. The ALPF generator predicts spatially-variant low-pass filters to attenuate high-frequency components within objects, reducing intra-class inconsistency during upsampling. The offset generator refines large inconsistent features and thin boundaries by replacing inconsistent features with more consistent ones through resampling, while the AHPF generator enhances high-frequency detailed boundary information lost during downsampling. Comprehensive visualization and quantitative analysis demonstrate that FreqFusion effectively improves feature consistency and sharpens object boundaries. Extensive experiments across various dense prediction tasks confirm its effectiveness. The code is made publicly available at https://github.com/Linwei-Chen/FreqFusion. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: Accepted by TPAMI (2024)

arXiv:2408.12857 [pdf, other]

Memory-Efficient LLM Training with Online Subspace Descent

Authors: Kaizhao Liang, Bo Liu, Lizhang Chen, Qiang Liu

Abstract: Recently, a wide range of memory-efficient LLM training algorithms have gained substantial popularity. These methods leverage the low-rank structure of gradients to project optimizer states into a subspace using projection matrix found by singular value decomposition (SVD). However, convergence of these algorithms is highly dependent on the update rules of their projection matrix. In this work, we… ▽ More Recently, a wide range of memory-efficient LLM training algorithms have gained substantial popularity. These methods leverage the low-rank structure of gradients to project optimizer states into a subspace using projection matrix found by singular value decomposition (SVD). However, convergence of these algorithms is highly dependent on the update rules of their projection matrix. In this work, we provide the \emph{first} convergence guarantee for arbitrary update rules of projection matrix. This guarantee is generally applicable to optimizers that can be analyzed with Hamiltonian Descent, including most common ones, such as LION, Adam. Inspired by our theoretical understanding, we propose Online Subspace Descent, a new family of subspace descent optimizer without SVD. Instead of updating the projection matrix with eigenvectors, Online Subspace Descent updates the projection matrix with online PCA. Online Subspace Descent is flexible and introduces only minimum overhead to training. We show that for the task of pretraining LLaMA models ranging from 60M to 7B parameters on the C4 dataset, Online Subspace Descent achieves lower perplexity and better downstream tasks performance than state-of-the-art low-rank training methods across different settings and narrows the gap with full-rank baselines. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: Code is available at https://github.com/kyleliang919/Online-Subspace-Descent

arXiv:2408.12715 [pdf]

Programmable scanning diffuse speckle contrast imaging of cerebral blood flow

Authors: Faezeh Akbari, Xuhui Liu, Fatemeh Hamedi, Mehrana Mohtasebi, Lei Chen, Guoqiang Yu

Abstract: Significance: Cerebral blood flow (CBF) imaging is crucial for diagnosing cerebrovascular diseases. However, existing large neuroimaging techniques with high cost, low sampling rate, and poor mobility make them unsuitable for continuous and longitudinal CBF monitoring at the bedside. Aim: This study aimed to develop a low-cost, portable, programmable scanning diffuse speckle contrast imaging (PS-D… ▽ More Significance: Cerebral blood flow (CBF) imaging is crucial for diagnosing cerebrovascular diseases. However, existing large neuroimaging techniques with high cost, low sampling rate, and poor mobility make them unsuitable for continuous and longitudinal CBF monitoring at the bedside. Aim: This study aimed to develop a low-cost, portable, programmable scanning diffuse speckle contrast imaging (PS-DSCI) technology for fast, high-density, and depth-sensitive imaging of CBF in rodents. Approach: The PS-DSCI employed a programmable digital micromirror device (DMD) for remote line-shape laser (785 nm) scanning on tissue surface and synchronized a 2D camera for capturing boundary diffuse laser speckle contrasts. New algorithms were developed to address deformations of line-shape scanning, thus minimizing CBF reconstruction artifacts. The PS-DSCI was examined in head-simulating phantoms and adult mice. Results: The PS-DSCI enables resolving Intralipid particle flow contrasts at different tissue depths. In vivo experiments in adult mice demonstrated the capability of PS-DSCI to image global/regional CBF variations induced by 8% CO2 inhalation and transient carotid artery ligations. Conclusions: Compared to conventional point scanning, the line scanning in PS-DSCI significantly increases spatiotemporal resolution. The high sampling rate of PS-DSCI is crucial for capturing rapid CBF changes while high spatial resolution is important for visualizing brain vasculature. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: 23 Pages, 8 Figures, 4 Tables

arXiv:2408.12527 [pdf, other]

UMAD: University of Macau Anomaly Detection Benchmark Dataset

Authors: Dong Li, Lineng Chen, Cheng-Zhong Xu, Hui Kong

Abstract: Anomaly detection is critical in surveillance systems and patrol robots by identifying anomalous regions in images for early warning. Depending on whether reference data are utilized, anomaly detection can be categorized into anomaly detection with reference and anomaly detection without reference. Currently, anomaly detection without reference, which is closely related to out-of-distribution (OoD… ▽ More Anomaly detection is critical in surveillance systems and patrol robots by identifying anomalous regions in images for early warning. Depending on whether reference data are utilized, anomaly detection can be categorized into anomaly detection with reference and anomaly detection without reference. Currently, anomaly detection without reference, which is closely related to out-of-distribution (OoD) object detection, struggles with learning anomalous patterns due to the difficulty of collecting sufficiently large and diverse anomaly datasets with the inherent rarity and novelty of anomalies. Alternatively, anomaly detection with reference employs the scheme of change detection to identify anomalies by comparing semantic changes between a reference image and a query one. However, there are very few ADr works due to the scarcity of public datasets in this domain. In this paper, we aim to address this gap by introducing the UMAD Benchmark Dataset. To our best knowledge, this is the first benchmark dataset designed specifically for anomaly detection with reference in robotic patrolling scenarios, e.g., where an autonomous robot is employed to detect anomalous objects by comparing a reference and a query video sequences. The reference sequences can be taken by the robot along a specified route when there are no anomalous objects in the scene. The query sequences are captured online by the robot when it is patrolling in the same scene following the same route. Our benchmark dataset is elaborated such that each query image can find a corresponding reference based on accurate robot localization along the same route in the prebuilt 3D map, with which the reference and query images can be geometrically aligned using adaptive warping. Besides the proposed benchmark dataset, we evaluate the baseline models of ADr on this dataset. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: Accepted by the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024, project code at https://github.com/IMRL/UMAD

arXiv:2408.12526 [pdf, other]

Exploiting Student Parallelism for Low-latency GPU Inference of BERT-like Models in Online Services

Authors: Weiyan Wang, Yilun Jin, Yiming Zhang, Victor Junqiu Wei, Han Tian, Li Chen, Kai Chen

Abstract: Due to high accuracy, BERT-like models have been widely adopted by discriminative text mining and web searching. However, large BERT-like models suffer from inefficient online inference, as they face the following two problems on GPUs. First, they rely on the large model depth to achieve high accuracy, which linearly increases the sequential computation on GPUs. Second, stochastic and dynamic onli… ▽ More Due to high accuracy, BERT-like models have been widely adopted by discriminative text mining and web searching. However, large BERT-like models suffer from inefficient online inference, as they face the following two problems on GPUs. First, they rely on the large model depth to achieve high accuracy, which linearly increases the sequential computation on GPUs. Second, stochastic and dynamic online workloads cause extra costs. In this paper, we present Academus for low-latency online inference of BERT-like models. At the core of Academus is the novel student parallelism, which adopts boosting ensemble and stacking distillation to distill the original deep model into an equivalent group of parallel and shallow student models. This enables Academus to achieve the lower model depth (e.g., two layers) than baselines and consequently the lowest inference latency without affecting the accuracy.For occasional workload bursts, it can temporarily decrease the number of students with minimal accuracy loss to improve throughput. Additionally, it employs specialized system designs for student parallelism to better handle stochastic online workloads. We conduct comprehensive experiments to verify the effectiveness. The results show that Academus outperforms the baselines by 4.1X~1.6X in latency without compromising accuracy, and achieves up to 22.27X higher throughput for workload bursts. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.12104 [pdf, other]

Minute-Cadence Observations of the LAMOST Fields with the TMTS: IV -- Catalog of Cataclysmic Variables from the First 3-yr Survey

Authors: Qichun Liu, Jie Lin, Xiaofeng Wang, Zhibin Dai, Yongkang Sun, Gaobo Xi, Jun Mo, Jialian Liu, Shengyu Yan, Alexei V. Filippenko, Thomas G. Brink, Yi Yang, Kishore C. Patra, Yongzhi Cai, Zhihao Chen, Liyang Chen, Fangzhou Guo, Xiaojun Jiang, Gaici Li, Wenxiong Li, Weili Lin, Cheng Miao, Xiaoran Ma, Haowei Peng, Qiqi Xia , et al. (2 additional authors not shown)

Abstract: The Tsinghua University--Ma Huateng Telescopes for Survey (TMTS) started to monitor the LAMOST plates in 2020, leading to the discovery of numerous short-period eclipsing binaries, peculiar pulsators, flare stars, and other variable objects. Here, we present the uninterrupted light curves for a sample of 64 cataclysmic variables (CVs) observed/discovered using the TMTS during its first three-year… ▽ More The Tsinghua University--Ma Huateng Telescopes for Survey (TMTS) started to monitor the LAMOST plates in 2020, leading to the discovery of numerous short-period eclipsing binaries, peculiar pulsators, flare stars, and other variable objects. Here, we present the uninterrupted light curves for a sample of 64 cataclysmic variables (CVs) observed/discovered using the TMTS during its first three-year observations, and we introduce new CVs and new light-variation periods (from known CVs) revealed through the TMTS observations. Thanks to the high-cadence observations of TMTS, diverse light variations, including superhumps, quasi-periodic oscillations, large-amplitude orbital modulations, and rotational modulations, are able to be detected in our CV samples, providing key observational clues for understanding the fast-developing physical processes in various CVs. All of these short-timescale light-curve features help further classify the subtypes of CV systems. We highlight the light-curve features observed in our CV sample and discuss further implications of minute-cadence light curves for CV identifications and classifications. Moreover, we examine the H$α$ emission lines in the spectra from our nonmagnetic CV samples (i.e., dwarf novae and nova-like subclasses) and find that the distribution of H$α$ emission strength shows significant differences between the sources with orbital periods above and below the period gap, which agrees with the trend seen from the SDSS nonmagnetic CV sample. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 27 pages, 12 figures in main text, accepted for the publication in Universe

arXiv:2408.12074 [pdf, ps, other]

Vertex-primitive s-arc-transitive digraphs of symplectic groups

Authors: Lei Chen, Michael Giudici, Cheryl E. Praeger

Abstract: A digraph is $s$-arc-transitive if its automorphism group is transitive on directed paths with $s$ edges, that is, on $s$-arcs. Although infinite families of finite $s$-arc transitive digraphs of arbitrary valency were constructed by the third author in 1989, existence of a vertex-primitive $2$-arc-transitive digraph was not known until an infinite family was constructed by the second author with… ▽ More A digraph is $s$-arc-transitive if its automorphism group is transitive on directed paths with $s$ edges, that is, on $s$-arcs. Although infinite families of finite $s$-arc transitive digraphs of arbitrary valency were constructed by the third author in 1989, existence of a vertex-primitive $2$-arc-transitive digraph was not known until an infinite family was constructed by the second author with Li and Xia in 2017. This led to a conjecture by the second author and Xia in 2018 that, for a finite vertex-primitive $s$-arc-transitive digraph, $s$ is at most $2$, together with their proof that it is sufficient to prove the conjecture for digraphs with an almost simple group of automorphisms. This paper confirms the conjecture for finite symplectic groups. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11824

AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Authors: Yanda Li, Chi Zhang, Wanqi Yang, Bin Fu, Pei Cheng, Xin Chen, Ling Chen, Yunchao Wei

Abstract: With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible actio… ▽ More With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible action space that enhances adaptability across various applications including parser, text and vision descriptions. The agent operates through two main phases: exploration and deployment. During the exploration phase, functionalities of user interface elements are documented either through agent-driven or manual explorations into a customized structured knowledge base. In the deployment phase, RAG technology enables efficient retrieval and update from this knowledge base, thereby empowering the agent to perform tasks effectively and accurately. This includes performing complex, multi-step operations across various applications, thereby demonstrating the framework's adaptability and precision in handling customized task workflows. Our experimental results across various benchmarks demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios. Our code will be open source soon. △ Less

Submitted 23 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

Comments: Pre-print version, some content needs to be supplemented

arXiv:2408.11652 [pdf, other]

Quantum entanglement and non-Hermiticity in free-fermion systems

Authors: Li-Mei Chen, Yao Zhou, Shuai A. Chen, Peng Ye

Abstract: This topical review article reports rapid progress on the generalization and application of entanglement in non-Hermitian free-fermion quantum systems. We begin by examining the realization of non-Hermitian quantum systems through the Lindblad master equation, alongside a review of typical non-Hermitian free-fermion systems that exhibit unique features. A pedagogical discussion is provided on the… ▽ More This topical review article reports rapid progress on the generalization and application of entanglement in non-Hermitian free-fermion quantum systems. We begin by examining the realization of non-Hermitian quantum systems through the Lindblad master equation, alongside a review of typical non-Hermitian free-fermion systems that exhibit unique features. A pedagogical discussion is provided on the relationship between entanglement quantities and the correlation matrix in Hermitian systems. Building on this foundation, we focus on how entanglement concepts are extended to non-Hermitian systems from their Hermitian free-fermion counterparts, with a review of the general properties that emerge. Finally, we highlight various concrete studies, demonstrating that entanglement entropy remains a powerful diagnostic tool for characterizing non-Hermitian physics. The entanglement spectrum also reflects the topological characteristics of non-Hermitian topological systems, while unique non-Hermitian entanglement behaviors are also discussed. The review is concluded with several future directions. Through this review, we hope to provide a useful guide for researchers who are interested in entanglement in non-Hermitian quantum systems. △ Less

Submitted 26 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

Comments: version 2; a short review. ~14p, 1figure

arXiv:2408.11048 [pdf, other]

RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

Authors: Yi Zhao, Le Chen, Jan Schneider, Quankai Gao, Juho Kannala, Bernhard Schölkopf, Joni Pajarinen, Dieter Büchler

Abstract: It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these meth… ▽ More It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these methods struggle in a multi-song setting. Our work aims to close this gap and, thereby, enable imitation learning approaches for robot piano playing at scale. To this end, we introduce the Robot Piano 1 Million (RP1M) dataset, containing bi-manual robot piano playing motion data of more than one million trajectories. We formulate finger placements as an optimal transport problem, thus, enabling automatic annotation of vast amounts of unlabeled songs. Benchmarking existing imitation learning approaches shows that such approaches reach state-of-the-art robot piano playing performance by leveraging RP1M. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: Project Website: https://rp1m.github.io/

arXiv:2408.10471 [pdf, ps, other]

Eigenvalues and eigenvectors of complex Hadamard matrices

Authors: Mengfan Liang, Lin Chen

Abstract: Characterizing the $6\times 6$ complex Hadamard matrices (CHMs) is an open problem in linear algebra and quantum information. In this paper, we investigate the eigenvalues and eigenvectors of CHMs. We show that any $n\times n$ CHM with dephased form has two constant eigenvalues $\pm\sqrt{n}$ and has two constant eigenvectors. We obtain the maximum numbers of identical eigenvalues of $6\times 6$ CH… ▽ More Characterizing the $6\times 6$ complex Hadamard matrices (CHMs) is an open problem in linear algebra and quantum information. In this paper, we investigate the eigenvalues and eigenvectors of CHMs. We show that any $n\times n$ CHM with dephased form has two constant eigenvalues $\pm\sqrt{n}$ and has two constant eigenvectors. We obtain the maximum numbers of identical eigenvalues of $6\times 6$ CHMs with dephased form and we extend this result to arbitrary dimension. We also show that there is no $6\times 6$ CHM with four identical eigenvalues. We conjecture that the eigenvalues and eigenvectors of $6\times 6$ CHMs will lead to the complete classification of $6\times 6$ CHMs. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 15 pages,0 figures

arXiv:2408.10198 [pdf, other]

MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

Authors: Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su

Abstract: Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. S… ▽ More Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. Specifically, instead of using a triplane representation, we store features in 3D sparse voxels and combine transformers with 3D convolutions to leverage an explicit 3D structure and projective bias. In addition to sparse-view RGB input, we require the network to take input and generate corresponding normal maps. The input normal maps can be predicted by 2D diffusion models, significantly aiding in the guidance and refinement of the geometry's learning. Moreover, by combining Signed Distance Function (SDF) supervision with surface rendering, we directly learn to generate high-quality meshes without the need for complex multi-stage training processes. By incorporating these explicit 3D biases, MeshFormer can be trained efficiently and deliver high-quality textured meshes with fine-grained geometric details. It can also be integrated with 2D diffusion models to enable fast single-image-to-3D and text-to-3D tasks. Project page: https://meshformer3d.github.io △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 20 pages, 9 figures

arXiv:2408.10195 [pdf, other]

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

Authors: Chao Xu, Ang Li, Linghao Chen, Yulin Liu, Ruoxi Shi, Hao Su, Minghua Liu

Abstract: Open-world 3D generation has recently attracted considerable attention. While many single-image-to-3D methods have yielded visually appealing outcomes, they often lack sufficient controllability and tend to produce hallucinated regions that may not align with users' expectations. In this paper, we explore an important scenario in which the input consists of one or a few unposed 2D images of a sing… ▽ More Open-world 3D generation has recently attracted considerable attention. While many single-image-to-3D methods have yielded visually appealing outcomes, they often lack sufficient controllability and tend to produce hallucinated regions that may not align with users' expectations. In this paper, we explore an important scenario in which the input consists of one or a few unposed 2D images of a single object, with little or no overlap. We propose a novel method, SpaRP, to reconstruct a 3D textured mesh and estimate the relative camera poses for these sparse-view images. SpaRP distills knowledge from 2D diffusion models and finetunes them to implicitly deduce the 3D spatial relationships between the sparse views. The diffusion model is trained to jointly predict surrogate representations for camera poses and multi-view images of the object under known poses, integrating all information from the input sparse views. These predictions are then leveraged to accomplish 3D reconstruction and pose estimation, and the reconstructed 3D model can be used to further refine the camera poses of input views. Through extensive experiments on three datasets, we demonstrate that our method not only significantly outperforms baseline methods in terms of 3D reconstruction quality and pose prediction accuracy but also exhibits strong efficiency. It requires only about 20 seconds to produce a textured mesh and camera poses for the input views. Project page: https://chaoxu.xyz/sparp. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: ECCV 2024

Showing 1–50 of 5,995 results for author: Chen, L