Search | arXiv e-print repository

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Authors: Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

Abstract: While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequent… ▽ More While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequently undergo inadequate evaluation of their capabilities and limitations, potentially leading to misalignment and unsafe fine-tuning outcomes. To address this issue, we introduce MJ-Bench, a novel benchmark which incorporates a comprehensive preference dataset to evaluate multimodal judges in providing feedback for image generation models across four key perspectives: alignment, safety, image quality, and bias. Specifically, we evaluate a large variety of multimodal judges including smaller-sized CLIP-based scoring models, open-source VLMs (e.g. LLaVA family), and close-source VLMs (e.g. GPT-4o, Claude 3) on each decomposed subcategory of our preference dataset. Experiments reveal that close-source VLMs generally provide better feedback, with GPT-4o outperforming other judges in average. Compared with open-source VLMs, smaller-sized scoring models can provide better feedback regarding text-image alignment and image quality, while VLMs provide more accurate feedback regarding safety and generation bias due to their stronger reasoning capabilities. Further studies in feedback scale reveal that VLM judges can generally provide more accurate and stable feedback in natural language (Likert-scale) than numerical scales. Notably, human evaluations on end-to-end fine-tuned models using separate feedback from these multimodal judges provide similar conclusions, further confirming the effectiveness of MJ-Bench. All data, code, models are available at https://huggingface.co/MJ-Bench. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 42 pages, 13 figures, 33 tables

arXiv:2407.04661 [pdf, other]

MIRI MRS Observations of Beta Pictoris II. The Spectroscopic Case for a Recent Giant Collision

Authors: Christine H. Chen, Cicero X. Lu, Kadin Worthen, David R. Law, B. A. Sargent, Amaya Moro-Martin, G. C. Sloan, Carey M. Lisse, Dan M. Watson, Julien H. Girard, Yiwei Chai, Dean C. Hines, Jens Kammerer, Alexis Li, Marshall Perrin, Laurent Pueyo, Isabel Rebollido, Karl R. Stapelfeldt, Christopher Stark, Michael W. Werner

Abstract: Modeling observations of the archetypal debris disk around $β$ Pic, obtained in 2023 January with the MIRI MRS on board JWST, reveals significant differences compared with that obtained with the IRS on board Spitzer. The bright 5 - 15 $μ$m continuum excess modeled using a $\sim$600 K black body has disappeared. The previously prominent 18 and 23 $μ$m crystalline forsterite emission features, arisi… ▽ More Modeling observations of the archetypal debris disk around $β$ Pic, obtained in 2023 January with the MIRI MRS on board JWST, reveals significant differences compared with that obtained with the IRS on board Spitzer. The bright 5 - 15 $μ$m continuum excess modeled using a $\sim$600 K black body has disappeared. The previously prominent 18 and 23 $μ$m crystalline forsterite emission features, arising from cold dust ($\sim$100 K) in the Rayleigh limit, have disappeared and been replaced by very weak features arising from the hotter 500 K dust population. Finally, the shape of the 10 $μ$m silicate feature has changed, consistent with a shift in the temperature of the warm dust population from $\sim$300 K to $\sim$500 K and an increase in the crystalline fraction of the warm, silicate dust. Stellar radiation pressure may have blown both the hot and the cold crystalline dust particles observed in the Spitzer spectra out of the planetary system during the intervening 20 years between the Spitzer and JWST observations. These results indicate that the $β$ Pic system has a dynamic circumstellar environment, and that periods of enhanced collisions can create large clouds of dust that sweep through the planetary system. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 15 pages, 8 figures, ApJ in press

arXiv:2407.04338 [pdf, other]

Entanglement distribution based on quantum walk in arbitrary quantum networks

Authors: Tianen Chen, Yun Shang, Chitong Chen, Heng Fan

Abstract: In large-scale quantum networks, distributing the multi-particle entangled state among selected nodes is crucial for realizing long-distance and complicated quantum communication. Quantum repeaters provides an efficient method to generate entanglement between distant nodes. However, it is difficult to extend quantum repeater protocols to high-dimensional quantum states in existing experiments. Her… ▽ More In large-scale quantum networks, distributing the multi-particle entangled state among selected nodes is crucial for realizing long-distance and complicated quantum communication. Quantum repeaters provides an efficient method to generate entanglement between distant nodes. However, it is difficult to extend quantum repeater protocols to high-dimensional quantum states in existing experiments. Here we develop a series of scheme for generating high-dimensional entangled states via quantum walks with multiple coins or single coin by quantum repeaters, including $d$-dimensional Bell states, multi-particle high dimensional GHZ states etc.. Furthermore, we give entanglement distribution schemes on arbitrary quantum networks according to the above theoretical framework. As applications, we construct quantum fractal networks and multiparty quantum secret sharing protocols based on $d$-dimensional GHZ states. In the end, we give the experiment implementing of various 2-party or 3-party entanglement generation schemes based on repeaters. Our work can serve as a building block for constructing larger and more complex quantum networks. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.03573 [pdf, other]

An analytic, moment-based method to estimate orthopositronium lifetimes in positron annihilation lifetime spectroscopy measurements

Authors: Lucas Berens, Isaac Hsu, Chin-Tu Chen, Howard Halpern, Chien-Min Kao

Abstract: The presence of tumor hypoxia is known to correlate with poor patient prognosis. Measurement of tissue oxygen concentration can be challenging, but recent advancements using positron annihilation lifetime spectroscopy (PALS) in three-dimensional positron emission tomography (PET) scans have shown promise for hypoxia detection. In this work, a novel method for estimating the orthopositronium lifeti… ▽ More The presence of tumor hypoxia is known to correlate with poor patient prognosis. Measurement of tissue oxygen concentration can be challenging, but recent advancements using positron annihilation lifetime spectroscopy (PALS) in three-dimensional positron emission tomography (PET) scans have shown promise for hypoxia detection. In this work, a novel method for estimating the orthopositronium lifetime in PALS is presented. This method is analytical and uses moments of the time-difference histogram from photon arrival times. For sufficient statistical power, the method produces monotonic, stable estimates. For cases with a lower number of photon counts, the method was characterized and solutions are presented to correct for bias and estimation variability. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03104 [pdf, other]

KeyVideoLLM: Towards Large-scale Video Keyframe Selection

Authors: Hao Liang, Jiapeng Li, Tianyi Bai, Chong Chen, Conghui He, Bin Cui, Wentao Zhang

Abstract: Recently, with the rise of web videos, managing and understanding large-scale video datasets has become increasingly important. Video Large Language Models (VideoLLMs) have emerged in recent years due to their strong video understanding capabilities. However, training and inference processes for VideoLLMs demand vast amounts of data, presenting significant challenges to data management, particular… ▽ More Recently, with the rise of web videos, managing and understanding large-scale video datasets has become increasingly important. Video Large Language Models (VideoLLMs) have emerged in recent years due to their strong video understanding capabilities. However, training and inference processes for VideoLLMs demand vast amounts of data, presenting significant challenges to data management, particularly regarding efficiency, robustness, and effectiveness. In this work, we present KeyVideoLLM, a text-video frame similarity-based keyframe selection method designed to manage VideoLLM data efficiently, robustly, and effectively. Specifically, KeyVideoLLM achieves a remarkable data compression rate of up to 60.9 times, substantially lowering disk space requirements, which proves its high efficiency. Additionally, it maintains a 100% selection success rate across all video formats and scales, enhances processing speed by up to 200 times compared to existing keyframe selection methods, and does not require hyperparameter tuning. Beyond its outstanding efficiency and robustness, KeyVideoLLM further improves model performance in video question-answering tasks during both training and inference stages. Notably, it consistently achieved the state-of-the-art (SoTA) experimental results on diverse datasets. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03037 [pdf, other]

Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model

Authors: Zhe Liu, Cheng Li, Chunyang Chen, Junjie Wang, Boyu Wu, Yawen Wang, Jun Hu, Qing Wang

Abstract: With the advancement of software rendering techniques, GUI pages in mobile apps now encompass a wealth of visual information, where the visual semantics of each page contribute to the overall app logic, presenting new challenges to software testing. Despite the progress in automated Graphical User Interface (GUI) testing, the absence of testing oracles has constrained its efficacy to identify only… ▽ More With the advancement of software rendering techniques, GUI pages in mobile apps now encompass a wealth of visual information, where the visual semantics of each page contribute to the overall app logic, presenting new challenges to software testing. Despite the progress in automated Graphical User Interface (GUI) testing, the absence of testing oracles has constrained its efficacy to identify only crash bugs with evident abnormal signals. Nonetheless, there are still a considerable number of non-crash bugs, ranging from unexpected behaviors to misalignments, often evading detection by existing techniques. While these bugs can exhibit visual cues that serve as potential testing oracles, they often entail a sequence of screenshots, and detecting them necessitates an understanding of the operational logic among GUI page transitions, which is challenging traditional techniques. Considering the remarkable performance of Multimodal Large Language Models (MLLM) in visual and language understanding, this paper proposes a vision-driven automated GUI testing approach VisionDroid to detect non-crash functional bugs with MLLM. It begins by extracting GUI text information and aligning it with screenshots to form a vision prompt, enabling MLLM to understand GUI context. The function-aware explorer then employs MLLM for deeper and function-oriented GUI page exploration, while the logic-aware bug detector segments the entire exploration history into logically cohesive parts and prompts the MLLM for bug detection. We evaluate VisionDroid on three datasets and compare it with 10 baselines, demonstrating its excellent performance. The ablation study further proves the contribution of each module. Moreover, VisionDroid identifies 29 new bugs on Google Play, of which 19 have been confirmed and fixed. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02899 [pdf, other]

Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be… ▽ More A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02614 [pdf, other]

AcuVR: Enhancing Acupuncture Training Workflow with Virtual Reality

Authors: Menghe Zhang, Chen Chen, Matin Yarmand, Anish Rajeshkumar, Nadir Weibel

Abstract: Acupuncture is a widely adopted medical practice that involves inserting thin needles into specific points on the body to alleviate pain and treat various health conditions. Current learning practices heavily rely on 2D atlases and practice on peers, which are notably less intuitive and pose risks, particularly in sensitive areas such as the eyes. To address these challenges, we introduce AcuVR, a… ▽ More Acupuncture is a widely adopted medical practice that involves inserting thin needles into specific points on the body to alleviate pain and treat various health conditions. Current learning practices heavily rely on 2D atlases and practice on peers, which are notably less intuitive and pose risks, particularly in sensitive areas such as the eyes. To address these challenges, we introduce AcuVR, a Virtual Reality (VR) based system designed to add a layer of interactivity and realism. This innovation aims to reduce the risks associated with practicing acupuncture techniques while offering more effective learning strategies. Furthermore, AcuVR incorporates medical imaging and standardized anatomy models, enabling the simulation of customized acupuncture scenarios. This feature represents a significant advancement beyond the limitations of conventional resources such as atlases and textbooks, facilitating a more immersive and personalized learning experience. The evaluation study with eight acupuncture students and practitioners revealed high participant satisfaction and pointed to the effectiveness and potential of AcuVR as a valuable addition to acupuncture training. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 10 pages

ACM Class: J.3; J.4; H.5

arXiv:2407.02252 [pdf, other]

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

Authors: Jian Ma, Yonglin Deng, Chen Chen, Haonan Lu, Zhenyu Yang

Abstract: Posters play a crucial role in marketing and advertising, contributing significantly to industrial design by enhancing visual communication and brand visibility. With recent advances in controllable text-to-image diffusion models, more concise research is now focusing on rendering text within synthetic images. Despite improvements in text rendering accuracy, the field of end-to-end poster generati… ▽ More Posters play a crucial role in marketing and advertising, contributing significantly to industrial design by enhancing visual communication and brand visibility. With recent advances in controllable text-to-image diffusion models, more concise research is now focusing on rendering text within synthetic images. Despite improvements in text rendering accuracy, the field of end-to-end poster generation remains underexplored. This complex task involves striking a balance between text rendering accuracy and automated layout to produce high-resolution images with variable aspect ratios. To tackle this challenge, we propose an end-to-end text rendering framework employing a triple cross-attention mechanism rooted in align learning, designed to create precise poster text within detailed contextual backgrounds. Additionally, we introduce a high-resolution dataset that exceeds 1024 pixels in image resolution. Our approach leverages the SDXL architecture. Extensive experiments validate the ability of our method to generate poster images featuring intricate and contextually rich backgrounds. Codes will be available at https://github.com/OPPO-Mente-Lab/GlyphDraw2. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02243 [pdf, other]

Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization

Authors: Yuchen Hu, Chen Chen, Siyin Wang, Eng Siong Chng, Chao Zhang

Abstract: In this paper, we propose reverse inference optimization (RIO), a simple and effective method designed to enhance the robustness of autoregressive-model-based zero-shot text-to-speech (TTS) systems using reinforcement learning from human feedback (RLHF). To assess the quality of speech produced by the TTS system without human annotations, RIO introduces a novel concept termed as reverse inference… ▽ More In this paper, we propose reverse inference optimization (RIO), a simple and effective method designed to enhance the robustness of autoregressive-model-based zero-shot text-to-speech (TTS) systems using reinforcement learning from human feedback (RLHF). To assess the quality of speech produced by the TTS system without human annotations, RIO introduces a novel concept termed as reverse inference based on the Bayesian principle, which suggests that a high-quality generated speech should be able to be used as a prompt for subsequent generation using the same TTS model. By leveraging reverse inference as the standard to select exemplars used in RLHF from the speech samples generated by the TTS system itself, RIO steers the subsequent optimization towards a direction of enhancing the TTS robustness. The RIO framework, comprising sampling, automatic annotating, and learning, obviates the need for a reward model or pairwise preference data, and significantly improves the stability of zero-shot TTS performance by reducing the discrepancies between training and inference conditions. Our experimental results verify that RIO can effectively improve both subjective and objective metrics, including mean opinion scores, word error rates, and speaker similarity. Remarkably, RIO can also diminish the incidence of bad outputs to nearly zero percent, rivalling the robustness when using ground-truth speech as the prompt. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 12 pages, Work in progress

arXiv:2407.02068 [pdf, other]

LPViT: Low-Power Semi-structured Pruning for Vision Transformers

Authors: Kaixin Xu, Zhe Wang, Chunyun Chen, Xue Geng, Jie Lin, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin

Abstract: Vision transformers have emerged as a promising alternative to convolutional neural networks for various image analysis tasks, offering comparable or superior performance. However, one significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, computation complexity, and power consumption. To democratize this high-performance technology and make it more… ▽ More Vision transformers have emerged as a promising alternative to convolutional neural networks for various image analysis tasks, offering comparable or superior performance. However, one significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, computation complexity, and power consumption. To democratize this high-performance technology and make it more environmentally friendly, it is essential to compress ViT models, reducing their resource requirements while maintaining high performance. In this paper, we introduce a new block-structured pruning to address the resource-intensive issue for ViTs, offering a balanced trade-off between accuracy and hardware acceleration. Unlike unstructured pruning or channel-wise structured pruning, block pruning leverages the block-wise structure of linear layers, resulting in more efficient matrix multiplications. To optimize this pruning scheme, our paper proposes a novel hardware-aware learning objective that simultaneously maximizes speedup and minimizes power consumption during inference, tailored to the block sparsity structure. This objective eliminates the need for empirical look-up tables and focuses solely on reducing parametrized layer connections. Moreover, our paper provides a lightweight algorithm to achieve post-training pruning for ViTs, utilizing second-order Taylor approximation and empirical optimization to solve the proposed hardware-aware objective. Extensive experiments on ImageNet are conducted across various ViT architectures, including DeiT-B and DeiT-S, demonstrating competitive performance with other pruning methods and achieving a remarkable balance between accuracy preservation and power savings. Especially, we achieve up to 3.93x and 1.79x speedups on dedicated hardware and GPUs respectively for DeiT-B, and also observe an inference power reduction by 1.4x on real-world GPUs. △ Less

Submitted 12 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01912 [pdf, other]

Relay-Assisted Carrier Aggregation (RACA) Uplink System for Enhancing Data Rate of Extended Reality (XR)

Authors: Chi-Wei Chen, Wen-Chiao Tsai, Lung-Sheng Tsai, An-Yeu, Wu

Abstract: In Extended Reality (XR) applications, high data rates and low latency are crucial for immersive experiences. Uplink transmission in XR is challenging due to the limited antennas and power of lightweight XR devices. To improve data transmission rates, we investigate a relay-assisted carrier aggregation (RACA) system. The XR device simultaneously transmits data to an access point (AP) and a relay i… ▽ More In Extended Reality (XR) applications, high data rates and low latency are crucial for immersive experiences. Uplink transmission in XR is challenging due to the limited antennas and power of lightweight XR devices. To improve data transmission rates, we investigate a relay-assisted carrier aggregation (RACA) system. The XR device simultaneously transmits data to an access point (AP) and a relay in proximity over low-frequency and high-frequency bands, respectively. Then, the relay down-converts and amplifies the signals to the AP, effectively acting as an additional transmit antenna for the XR device. In this paper, we propose two algorithms to maximize the data rate of the XR device in their respective protocols. In the centralized protocol, the rate maximization problem is equivalently transformed as a weighted mean square error minimization (WMMSE) problem which can be solved iteratively by alternative optimization. In the distributed protocol, the rate maximization problem is decomposed into two independent sub-problems where the rate of the direct link and the rate of the relay link are maximized by singular value decomposition (SVD)-based methods with water-filling (WF). Simulation results show that the rate of the RACA system is improved by $32\%$ compared to that of the conventional carrier aggregation scheme. △ Less

Submitted 16 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01770 [pdf, other]

Exploring causal effects of hormone- and radio-treatments in an observational study of breast cancer using copula-based semi-competing risks models

Authors: Tonghui Yu, Mengjiao Peng, Yifan Cui, Elynn Chen, Chixiang Chen

Abstract: Breast cancer patients may experience relapse or death after surgery during the follow-up period, leading to dependent censoring of relapse. This phenomenon, known as semi-competing risk, imposes challenges in analyzing treatment effects on breast cancer and necessitates advanced statistical tools for unbiased analysis. Despite progress in estimation and inference within semi-competing risks regre… ▽ More Breast cancer patients may experience relapse or death after surgery during the follow-up period, leading to dependent censoring of relapse. This phenomenon, known as semi-competing risk, imposes challenges in analyzing treatment effects on breast cancer and necessitates advanced statistical tools for unbiased analysis. Despite progress in estimation and inference within semi-competing risks regression, its application to causal inference is still in its early stages. This article aims to propose a frequentist and semi-parametric framework based on copula models that can facilitate valid causal inference, net quantity estimation and interpretation, and sensitivity analysis for unmeasured factors under right-censored semi-competing risks data. We also propose novel procedures to enhance parameter estimation and its applicability in real practice. After that, we apply the proposed framework to a breast cancer study and detect the time-varying causal effects of hormone- and radio-treatments on patients' relapse-free survival and overall survival. Moreover, extensive numerical evaluations demonstrate the method's feasibility, highlighting minimal estimation bias and reliable statistical inference. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Contact: chixiang.chen@som.umaryland.edu

arXiv:2407.01599 [pdf, other]

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Authors: Haibo Jin, Leyang Hu, Xinuo Li, Peiyan Zhang, Chonghan Chen, Jun Zhuang, Haohan Wang

Abstract: The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignm… ▽ More The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignment. This survey provides an extensive review of the emerging field of jailbreaking--deliberately circumventing the ethical and operational boundaries of LLMs and VLMs--and the consequent development of defense mechanisms. Our study categorizes jailbreaks into seven distinct types and elaborates on defense strategies that address these vulnerabilities. Through this comprehensive examination, we identify research gaps and propose directions for future studies to enhance the security frameworks of LLMs and VLMs. Our findings underscore the necessity for a unified perspective that integrates both jailbreak strategies and defensive solutions to foster a robust, secure, and reliable environment for the next generation of language models. More details can be found on our website: \url{https://chonghan-chen.com/llm-jailbreak-zoo-survey/}. △ Less

Submitted 24 July, 2024; v1 submitted 25 June, 2024; originally announced July 2024.

Comments: 45 pages

arXiv:2407.01413 [pdf, other]

AtLAST Science Overview Report

Authors: Mark Booth, Pamela Klaassen, Claudia Cicone, Tony Mroczkowski, Martin A. Cordiner, Luca Di Mascolo, Doug Johnstone, Eelco van Kampen, Minju M. Lee, Daizhong Liu, John Orlowski-Scherer, Amélie Saintonge, Matthew W. L. Smith, Alexander Thelen, Sven Wedemeyer, Kazunori Akiyama, Stefano Andreon, Doris Arzoumanian, Tom J. L. C. Bakx, Caroline Bot, Geoffrey Bower, Roman Brajša, Chian-Chou Chen, Elisabete da Cunha, David Eden , et al. (59 additional authors not shown)

Abstract: Submillimeter and millimeter wavelengths provide a unique view of the Universe, from the gas and dust that fills and surrounds galaxies to the chromosphere of our own Sun. Current single-dish facilities have presented a tantalising view of the brightest (sub-)mm sources, and interferometers have provided the exquisite resolution necessary to analyse the details in small fields, but there are still… ▽ More Submillimeter and millimeter wavelengths provide a unique view of the Universe, from the gas and dust that fills and surrounds galaxies to the chromosphere of our own Sun. Current single-dish facilities have presented a tantalising view of the brightest (sub-)mm sources, and interferometers have provided the exquisite resolution necessary to analyse the details in small fields, but there are still many open questions that cannot be answered with current facilities. In this report we summarise the science that is guiding the design of the Atacama Large Aperture Submillimeter Telescope (AtLAST). We demonstrate how tranformational advances in topics including star formation in high redshift galaxies, the diffuse circumgalactic medium, Galactic ecology, cometary compositions and solar flares motivate the need for a 50m, single-dish telescope with a 1-2 degree field of view and a new generation of highly multiplexed continuum and spectral cameras. AtLAST will have the resolution to drastically lower the confusion limit compared to current single-dish facilities, whilst also being able to rapidly map large areas of the sky and detect extended, diffuse structures. Its high sensitivity and large field of view will open up the field of submillimeter transient science by increasing the probability of serendipitous detections. Finally, the science cases listed here motivate the need for a highly flexible operations model capable of short observations of individual targets, large surveys, monitoring programmes, target of opportunity observations and coordinated observations with other observatories. AtLAST aims to be a sustainable, upgradeable, multipurpose facility that will deliver orders of magnitude increases in sensitivity and mapping speeds over current and planned submillimeter observatories. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 47 pages, 12 figures. For further details on AtLAST see https://atlast.uio.no

arXiv:2407.00995 [pdf, other]

Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense

Authors: Yi Yu, Shengyue Yao, Tianchen Zhou, Yexuan Fu, Jingru Yu, Ding Wang, Xuhong Wang, Cen Chen, Yilun Lin

Abstract: In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, an… ▽ More In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, and Artificial Intelligent (AI) agents. The DTM platform supports evident-based data value evaluation and AI-based trading mechanisms. Leveraging the common sense capabilities of Large Language Models (LLMs) to assess traffic state and data value, DTM can determine reasonable traffic data pricing through multi-round interaction and simulations. Moreover, DTM provides a pricing method validation by simulating traffic systems, multi-agent interactions, and the heterogeneity and irrational behaviors of individuals in the trading market. Within the DTM platform, entities such as connected vehicles and traffic light controllers could engage in information collecting, data pricing, trading, and decision-making. Simulation results demonstrate that our proposed AI agent-based pricing approach enhances data trading by offering rational prices, as evidenced by the observed improvement in traffic efficiency. This underscores the effectiveness and practical value of DTM, offering new perspectives for the evolution of data markets and smart cities. To the best of our knowledge, this is the first study employing LLMs in data pricing and a pioneering data trading practice in the field of intelligent vehicles and smart cities. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00561 [pdf, ps, other]

Advancing Information Integration through Empirical Likelihood: Selective Reviews and a New Idea

Authors: Chixiang Chen, Jia Liang, Elynn Chen, Ming Wang

Abstract: Information integration plays a pivotal role in biomedical studies by facilitating the combination and analysis of independent datasets from multiple studies, thereby uncovering valuable insights that might otherwise remain obscured due to the limited sample size in individual studies. However, sharing raw data from independent studies presents significant challenges, primarily due to the need to… ▽ More Information integration plays a pivotal role in biomedical studies by facilitating the combination and analysis of independent datasets from multiple studies, thereby uncovering valuable insights that might otherwise remain obscured due to the limited sample size in individual studies. However, sharing raw data from independent studies presents significant challenges, primarily due to the need to safeguard sensitive participant information and the cumbersome paperwork involved in data sharing. In this article, we first provide a selective review of recent methodological developments in information integration via empirical likelihood, wherein only summary information is required, rather than the raw data. Following this, we introduce a new insight and a potentially promising framework that could broaden the application of information integration across a wider spectrum. Furthermore, this new framework offers computational convenience compared to classic empirical likelihood-based methods. We provide numerical evaluations to assess its performance and discuss various extensions in the end. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00031 [pdf, other]

Supercharging Federated Learning with Flower and NVIDIA FLARE

Authors: Holger R. Roth, Daniel J. Beutel, Yan Cheng, Javier Fernandez Marques, Heng Pan, Chester Chen, Zhihong Zhang, Yuhong Wen, Sean Yang, Isaac, Yang, Yuan-Ting Hsieh, Ziyue Xu, Daguang Xu, Nicholas D. Lane, Andrew Feng

Abstract: Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in re… ▽ More Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in research and industry. Conversely, FLARE has prioritized the creation of an enterprise-ready, resilient runtime environment explicitly designed for FL applications in production environments. In this paper, we describe our initial integration of both frameworks and show how they can work together to supercharge the FL ecosystem as a whole. Through the seamless integration of Flower and FLARE, applications crafted within the Flower framework can effortlessly operate within the FLARE runtime environment without necessitating any modifications. This initial integration streamlines the process, eliminating complexities and ensuring smooth interoperability between the two platforms, thus enhancing the overall efficiency and accessibility of FL applications. △ Less

Submitted 22 July, 2024; v1 submitted 21 May, 2024; originally announced July 2024.

Comments: Added a figure comparing running a Flower application natively or within FLARE

arXiv:2406.19627 [pdf]

Practical Power System Inertia Monitoring Based on Pumped Storage Hydropower Operation Signature

Authors: Hongyu Li, Chang Chen, Mark Baldwin, Shutang You, Wenpeng Yu, Lin Zhu, Yilu Liu

Abstract: This paper proposes a practical method to monitor power system inertia using Pumped Storage Hydropower (PSH) switching-off events. This approach offers real-time system-level inertia estimation with minimal expenses, no disruption, and the inclusion of behind-the-meter inertia. First, accurate inertia estimation is achieved through improved RoCoF calculation that accounts for pre-event RoCoF, redu… ▽ More This paper proposes a practical method to monitor power system inertia using Pumped Storage Hydropower (PSH) switching-off events. This approach offers real-time system-level inertia estimation with minimal expenses, no disruption, and the inclusion of behind-the-meter inertia. First, accurate inertia estimation is achieved through improved RoCoF calculation that accounts for pre-event RoCoF, reducing common random frequency fluctuations in practice. Second, PSH field data is analyzed, highlighting the benefits of using switching-off events for grid inertia estimation. Third, an event detection trigger is designed to capture pump switching-off events based on local and system features. Fourth, the method is validated on the U.S. Eastern Interconnection model with over 60,000 buses, demonstrating very high accuracy (3%-5% error rate). Finally, it is applied to the U.S. Western Interconnection, with field validation showing a 9.9% average absolute error rate. Despite challenges in practical power system inertia estimation, this method enhances decision-making for power grid reliability and efficiency, addressing challenges posed by renewable energy integration. △ Less

Submitted 1 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 8 pages, 15 figures

arXiv:2406.19421 [pdf, other]

The Belle II Detector Upgrades Framework Conceptual Design Report

Authors: H. Aihara, A. Aloisio, D. P. Auguste, M. Aversano, M. Babeluk, S. Bahinipati, Sw. Banerjee, M. Barbero, J. Baudot, A. Beaubien, F. Becherer, T. Bergauer, F. U. Bernlochner., V. Bertacchi, G. Bertolone, C. Bespin, M. Bessner, S. Bettarini, A. J. Bevan, B. Bhuyan, M. Bona, J. F. Bonis, J. Borah, F. Bosi, R. Boudagga , et al. (186 additional authors not shown)

Abstract: We describe the planned near-term and potential longer-term upgrades of the Belle II detector at the SuperKEKB electron-positron collider operating at the KEK laboratory in Tsukuba, Japan. These upgrades will allow increasingly sensitive searches for possible new physics beyond the Standard Model in flavor, tau, electroweak and dark sector physics that are both complementary to and competitive wit… ▽ More We describe the planned near-term and potential longer-term upgrades of the Belle II detector at the SuperKEKB electron-positron collider operating at the KEK laboratory in Tsukuba, Japan. These upgrades will allow increasingly sensitive searches for possible new physics beyond the Standard Model in flavor, tau, electroweak and dark sector physics that are both complementary to and competitive with the LHC and other experiments. △ Less

Submitted 4 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: Editor: F. Forti 170 pages

Report number: KEK-REPORT-2024-1, BELLE2-REPORT-2024-042

arXiv:2406.19394 [pdf, other]

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

Authors: Liujuan Cao, Jianghang Lin, Zebo Hong, Yunhang Shen, Shaohui Lin, Chao Chen, Rongrong Ji

Abstract: Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum. In this paper, we introduce a unified, high-capacity weakly supervised object detection (WSOD) network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional sup… ▽ More Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum. In this paper, we introduce a unified, high-capacity weakly supervised object detection (WSOD) network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional supervision. HUWSOD innovatively incorporates a self-supervised proposal generator and an autoencoder proposal generator with a multi-rate resampling pyramid to replace traditional object proposals, enabling end-to-end WSOD training and inference. Additionally, we implement a holistic self-training scheme that refines detection scores and coordinates through step-wise entropy minimization and consistency-constraint regularization, ensuring consistent predictions across stochastic augmentations of the same image. Extensive experiments on PASCAL VOC and MS COCO demonstrate that HUWSOD competes with state-of-the-art WSOD methods, eliminating the need for offline proposals and additional data. The peak performance of HUWSOD approaches that of fully-supervised Faster R-CNN. Our findings also indicate that randomly initialized boxes, although significantly different from well-designed offline object proposals, are effective for WSOD training. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.19190 [pdf, ps, other]

Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec… ▽ More Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 13 pages, 6 figures

arXiv:2406.18259 [pdf, other]

Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated

Authors: Jiazhou Ji, Ruizhe Li, Shujun Li, Jie Guo, Weidong Qiu, Zheng Huang, Chiyu Chen, Xiaoyu Jiang, Xinru Lu

Abstract: As LLMs rapidly advance, increasing concerns arise regarding risks about actual authorship of texts we see online and in real world. The task of distinguishing LLM-authored texts is complicated by the nuanced and overlapping behaviors of both machines and humans. In this paper, we challenge the current practice of considering LLM-generated text detection a binary classification task of differentia… ▽ More As LLMs rapidly advance, increasing concerns arise regarding risks about actual authorship of texts we see online and in real world. The task of distinguishing LLM-authored texts is complicated by the nuanced and overlapping behaviors of both machines and humans. In this paper, we challenge the current practice of considering LLM-generated text detection a binary classification task of differentiating human from AI. Instead, we introduce a novel ternary text classification scheme, adding an "undecided" category for texts that could be attributed to either source, and we show that this new category is crucial to understand how to make the detection result more explainable to lay users. This research shifts the paradigm from merely classifying to explaining machine-generated texts, emphasizing need for detectors to provide clear and understandable explanations to users. Our study involves creating four new datasets comprised of texts from various LLMs and human authors. Based on new datasets, we performed binary classification tests to ascertain the most effective SOTA detection methods and identified SOTA LLMs capable of producing harder-to-detect texts. We constructed a new dataset of texts generated by two top-performing LLMs and human authors, and asked three human annotators to produce ternary labels with explanation notes. This dataset was used to investigate how three top-performing SOTA detectors behave in new ternary classification context. Our results highlight why "undecided" category is much needed from the viewpoint of explainability. Additionally, we conducted an analysis of explainability of the three best-performing detectors and the explanation notes of the human annotators, revealing insights about the complexity of explainable detection of machine-generated texts. Finally, we propose guidelines for developing future detection systems with improved explanatory power. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 19 pages, 2 figures

arXiv:2406.18197 [pdf, other]

Human-free Prompted Based Anomaly Detection: prompt optimization with Meta-guiding prompt scheme

Authors: Pi-Wei Chen, Jerry Chun-Wei Lin, Jia Ji, Feng-Hao Yeh, Chao-Chun Chen

Abstract: Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning, making prompt-based anomaly detection a promising approach. Traditional methods depend on human-crafted prompts that require prior knowledge of specific anomaly types. Our goal is to develop a human-free prompt-based anomaly detection framework that optimally learns prompts through… ▽ More Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning, making prompt-based anomaly detection a promising approach. Traditional methods depend on human-crafted prompts that require prior knowledge of specific anomaly types. Our goal is to develop a human-free prompt-based anomaly detection framework that optimally learns prompts through data-driven methods, eliminating the need for human intervention. The primary challenge in this approach is the lack of anomalous samples during the training phase. Additionally, the Vision Transformer (ViT)-based image encoder in VLMs is not ideal for pixel-wise anomaly segmentation due to a locality feature mismatch between the original image and the output feature map. To tackle the first challenge, we have developed the Object-Attention Anomaly Generation Module (OAGM) to synthesize anomaly samples for training. Furthermore, our Meta-Guiding Prompt-Tuning Scheme (MPTS) iteratively adjusts the gradient-based optimization direction of learnable prompts to avoid overfitting to the synthesized anomalies. For the second challenge, we propose Locality-Aware Attention, which ensures that each local patch feature attends only to nearby patch features, preserving the locality features corresponding to their original locations. This framework allows for the optimal prompt embeddings by searching in the continuous latent space via backpropagation, free from human semantic constraints. Additionally, the modified locality-aware attention improves the precision of pixel-wise anomaly segmentation. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18183 [pdf, other]

doi 10.1007/JHEP07(2024)258

Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of… ▽ More Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined. △ Less

Submitted 28 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: 26 pages,5 tables, 4 figures, consistent with the publication in JHEP07(2024)258

Journal ref: JHEP07(2024)258

arXiv:2406.18083 [pdf, other]

Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an… ▽ More Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 19 pages, 2 figures

arXiv:2406.18069 [pdf, other]

Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals

Authors: Zengding Liu, Chen Chen, Jiannong Cao, Minglei Pan, Jikui Liu, Nan Li, Fen Miao, Ye Li

Abstract: Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood press… ▽ More Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood pressure (BP) measurement, which is critical for the management of cardiovascular diseases. This paper presents the first work to explore the capacity of LLMs to perform cuffless BP estimation based on wearable biosignals. We extracted physiological features from electrocardiogram (ECG) and photoplethysmogram (PPG) signals and designed context-enhanced prompts by combining these features with BP domain knowledge and user information. Subsequently, we adapted LLMs to BP estimation tasks through fine-tuning. To evaluate the proposed approach, we conducted assessments of ten advanced LLMs using a comprehensive public dataset of wearable biosignals from 1,272 participants. The experimental results demonstrate that the optimally fine-tuned LLM significantly surpasses conventional task-specific baselines, achieving an estimation error of 0.00 $\pm$ 9.25 mmHg for systolic BP and 1.29 $\pm$ 6.37 mmHg for diastolic BP. Notably, the ablation studies highlight the benefits of our context enhancement strategy, leading to an 8.9% reduction in mean absolute error for systolic BP estimation. This paper pioneers the exploration of LLMs for cuffless BP measurement, providing a potential solution to enhance the accuracy of cuffless BP measurement. △ Less

Submitted 4 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17452 [pdf, ps, other]

Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (649 additional authors not shown)

Abstract: We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and… ▽ More We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17100 [pdf, other]

Fine-tuning Diffusion Models for Enhancing Face Quality in Text-to-image Generation

Authors: Zhenyi Liao, Qingsong Xie, Chen Chen, Hannan Lu, Zhijie Deng

Abstract: Diffusion models (DMs) have achieved significant success in generating imaginative images given textual descriptions. However, they are likely to fall short when it comes to real-life scenarios with intricate details.The low-quality, unrealistic human faces in text-to-image generation are one of the most prominent issues, hindering the wide application of DMs in practice. Targeting addressing such… ▽ More Diffusion models (DMs) have achieved significant success in generating imaginative images given textual descriptions. However, they are likely to fall short when it comes to real-life scenarios with intricate details.The low-quality, unrealistic human faces in text-to-image generation are one of the most prominent issues, hindering the wide application of DMs in practice. Targeting addressing such an issue, we first assess the face quality of generations from popular pre-trained DMs with the aid of human annotators and then evaluate the alignment between existing metrics such as ImageReward, Human Preference Score, Aesthetic Score Predictor, and Face Quality Assessment, with human judgments. Observing that existing metrics can be unsatisfactory for quantifying face quality, we develop a novel metric named Face Score (FS) by fine-tuning ImageReward on a dataset of (good, bad) face pairs cheaply crafted by an inpainting pipeline of DMs. Extensive studies reveal that FS enjoys a superior alignment with humans. On the other hand, FS opens up the door for refining DMs for better face generation. To achieve this, we incorporate a guidance loss on the denoising trajectories of the aforementioned face pairs for fine-tuning pre-trained DMs such as Stable Diffusion V1.5 and Realistic Vision V5.1. Intuitively, such a loss pushes the trajectory of bad faces toward that of good ones. Comprehensive experiments verify the efficacy of our approach for improving face quality while preserving general capability. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.17006 [pdf, other]

Probing the nature of the $χ_{c1}(3872)$ state using radiative decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1094 additional authors not shown)

Abstract: The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and… ▽ More The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and the ratio of its partial width to that of the $χ_{c1}(3872)\rightarrow J/ψγ$ decay is measured to be $$ \frac{Γ_{χ_{c1}(3872)\rightarrow ψ(2S)γ}} {Γ_{χ_{c1}(3872)\rightarrow J/ψγ}} = 1.67 \pm 0.21 \pm 0.12 \pm0.04 , $$ where the first uncertainty is statistical, the second systematic and the third is due to the uncertainties on the branching fractions of the $ψ(2S)$ and $J/ψ$ mesons. The measured ratio makes the interpretation of the $χ_{c1}(3872)$ state as a~pure $D^0\bar{D}^{*0}+\bar{D}^0D^{*0}$ molecule questionable and strongly indicates a sizeable compact charmonium or tetraquark component within the $χ_{c1}(3872)$ state. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 31 pages, 2 figures. All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-015.html (LHCb public pages)

Report number: LHCb-PAPER-2024-015, CERN-EP-2025-157

arXiv:2406.16910 [pdf, other]

Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning

Authors: Chi-Sheng Chen, Chun-Shu Wei

Abstract: Decoding images from non-invasive electroencephalographic (EEG) signals has been a grand challenge in understanding how the human brain process visual information in real-world scenarios. To cope with the issues of signal-to-noise ratio and nonstationarity, this paper introduces a MUltimodal Similarity-keeping contrastivE learning (MUSE) framework for zero-shot EEG-based image classification. We d… ▽ More Decoding images from non-invasive electroencephalographic (EEG) signals has been a grand challenge in understanding how the human brain process visual information in real-world scenarios. To cope with the issues of signal-to-noise ratio and nonstationarity, this paper introduces a MUltimodal Similarity-keeping contrastivE learning (MUSE) framework for zero-shot EEG-based image classification. We develop a series of multivariate time-series encoders tailored for EEG signals and assess the efficacy of regularized contrastive EEG-Image pretraining using an extensive visual EEG dataset. Our method achieves state-of-the-art performance, with a top-1 accuracy of 19.3% and a top-5 accuracy of 48.8% in 200-way zero-shot image classification. Furthermore, we visualize neural patterns via model interpretation, shedding light on the visual processing dynamics in the human brain. The code repository for this work is available at: https://github.com/ChiShengChen/MUSE_EEG. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 19 pages, 14 figures

arXiv:2406.16793 [pdf, other]

Adam-mini: Use Fewer Learning Rates To Gain More

Authors: Yushun Zhang, Congliang Chen, Ziniu Li, Tian Ding, Chenwei Wu, Yinyu Ye, Zhi-Quan Luo, Ruoyu Sun

Abstract: We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the learning rate resources in Adam (i.e., $1/\sqrt{v}$). We find that $\geq$ 90% of these learning rates in $v$ could be harmlessly removed if we (1) carefully partition the parameters into blocks following our proposed principle… ▽ More We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the learning rate resources in Adam (i.e., $1/\sqrt{v}$). We find that $\geq$ 90% of these learning rates in $v$ could be harmlessly removed if we (1) carefully partition the parameters into blocks following our proposed principle on Hessian structure; (2) assign a single but good learning rate to each parameter block. We further find that, for each of these parameter blocks, there exists a single high-quality learning rate that can outperform Adam, provided that sufficient resources are available to search it out. We then provide one cost-effective way to find good learning rates and propose Adam-mini. Empirically, we verify that Adam-mini performs on par or better than AdamW on various language models sized from 125M to 7B for pre-training, supervised fine-tuning, and RLHF. The reduced memory footprint of Adam-mini also alleviates communication overheads among GPUs and CPUs, thereby increasing throughput. For instance, Adam-mini achieves 49.6% higher throughput than AdamW when pre-training Llama2-7B on $2\times$ A800-80GB GPUs, which saves 33% wall-clock time for pre-training. △ Less

Submitted 3 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16169 [pdf, other]

On the origin of polar planets around single stars

Authors: Cheng Chen, Stanley A. Baronett, C. J. Nixon, Rebecca G. Martin

Abstract: The Rossiter-McLaughlin effect measures the misalignment between a planet's orbital plane and its host star's rotation plane. Around 10$\%$ of planets exhibit misalignments in the approximate range $80 - 125^\circ$, with their origin remaining a mystery. On the other hand, large misalignments may be common in eccentric circumbinary systems due to misaligned discs undergoing polar alignment. If the… ▽ More The Rossiter-McLaughlin effect measures the misalignment between a planet's orbital plane and its host star's rotation plane. Around 10$\%$ of planets exhibit misalignments in the approximate range $80 - 125^\circ$, with their origin remaining a mystery. On the other hand, large misalignments may be common in eccentric circumbinary systems due to misaligned discs undergoing polar alignment. If the binary subsequently merges, a polar circumbinary disc -- along with any planets that form within it -- may remain inclined near 90$^{\circ}$ to the merged star's rotation. To test this hypothesis, we present $N$-body simulations of the evolution of a polar circumbinary debris disc comprised of test particles around an eccentric binary during a binary merger that is induced by tidal dissipation. After the merger, the disc particles remain on near-polar orbits. Interaction of the binary with the polar-aligned gas disc may be required to bring the binary to the small separations that trigger the merger by tides. Our findings imply that planets forming in discs that are polar-aligned to the orbit of a high-eccentricity binary may, following the merger of the binary, provide a possible origin for the population of near-polar planets around single stars. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 6 pages, 3 figures

arXiv:2406.15885 [pdf, other]

The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

Authors: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai Zhao

Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel… ▽ More Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-related capabilities of LLMs. ZIQI-Eval encompasses a wide range of questions, covering 10 major categories and 56 subcategories, resulting in over 14,000 meticulously curated data entries. By leveraging ZIQI-Eval, we conduct a comprehensive evaluation over 16 LLMs to evaluate and analyze LLMs' performance in the domain of music. Results indicate that all LLMs perform poorly on the ZIQI-Eval benchmark, suggesting significant room for improvement in their musical capabilities. With ZIQI-Eval, we aim to provide a standardized and robust evaluation framework that facilitates a comprehensive assessment of LLMs' music-related abilities. The dataset is available at GitHub\footnote{https://github.com/zcli-charlie/ZIQI-Eval} and HuggingFace\footnote{https://huggingface.co/datasets/MYTH-Lab/ZIQI-Eval}. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: Accepted to ACL-Findings 2024

arXiv:2406.15554 [pdf, other]

Testing particle acceleration in blazar jets with continuous high-cadence optical polarization observations

Authors: Ioannis Liodakis, Sebastian Kiehlmann, Alan P. Marscher, Haocheng Zhang, Dmitry Blinov, Svetlana G. Jorstad, Iván Agudo, Erika Benítez, Andrei Berdyugin, Giacomo Bonnoli, Carolina Casadio, Chien-Ting Chen, Wen-Ping Chen, Steven R. Ehlert, Juan Escudero, Tatiana S. Grishina, David Hiriart, Angela Hsu, Ryo Imazawa, Helen E. Jermak, Jincen Jose, Philip Kaaret, Evgenia N. Kopatskaya, Bhavana Lalchand, Elena G. Larionova , et al. (22 additional authors not shown)

Abstract: Variability can be the pathway to understanding the physical processes in astrophysical jets, however, the high-cadence observations required to test particle acceleration models are still missing. Here we report on the first attempt to produce continuous, >24 hour polarization light curves of blazars using telescopes distributed across the globe and the rotation of the Earth to avoid the rising S… ▽ More Variability can be the pathway to understanding the physical processes in astrophysical jets, however, the high-cadence observations required to test particle acceleration models are still missing. Here we report on the first attempt to produce continuous, >24 hour polarization light curves of blazars using telescopes distributed across the globe and the rotation of the Earth to avoid the rising Sun. Our campaign involved 16 telescopes in Asia, Europe, and North America. We observed BL Lacertae and CGRaBS J0211+1051 for a combined 685 telescope hours. We find large variations in the polarization degree and angle for both sources in sub-hour timescales as well as a ~180 degree rotation of the polarization angle in CGRaBS J0211+1051 in less than two days. We compared our high-cadence observations to Particle-In-Cell magnetic reconnection and turbulent plasma simulations. We find that although the state of the art simulation frameworks can produce a large fraction of the polarization properties, they do not account for the entirety of the observed polarization behavior in blazar jets. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 20 pages, 15 figures, 2 tables, accepted for publication in A&A. The data used in the paper are available here: https://doi.org/10.7910/DVN/IETSXS

arXiv:2406.15553 [pdf, other]

Non-Ioffe-Larkin composition rule and spinon-dictated electric transport in doped Mott insulators

Authors: Chuan Chen, Jia-Xin Zhang, Zhi-Jian Song, Zheng-Yu Weng

Abstract: The electric resistivity is examined in the constrained Hilbert space of a doped Mott insulator, which is dictated by a non-Ioffe-Larkin composition rule due to the underlying mutual Chern-Simons topological gauge structure. In the low-temperature pseudogap phase, where holons remain condensed while spinons proliferate, the charge transport is governed by a chiral spinon excitation, comprising a b… ▽ More The electric resistivity is examined in the constrained Hilbert space of a doped Mott insulator, which is dictated by a non-Ioffe-Larkin composition rule due to the underlying mutual Chern-Simons topological gauge structure. In the low-temperature pseudogap phase, where holons remain condensed while spinons proliferate, the charge transport is governed by a chiral spinon excitation, comprising a bosonic spin-$1/2$ at the core of a supercurrent vortex. It leads to a vanishing resistivity with the "confinement" of the spinons in the superconducting phase but a low-$T$ divergence of the resistivity once the spinon confinement is disrupted by external magnetic fields. In the latter, the chiral spinons will generate a Hall number $n_H =$ doping concentration $δ$ and a Nernst effect to signal an underlying long-range entanglement between the charge and spin degrees of freedom. Their presence is further reflected in thermodynamic quantities such as specific heat and spin susceptibility. Finally, in the high-temperature spin-disordered phase, it is shown that the holons exhibit a linear-$T$ resistivity by scattering with the spinons acting as free local moments, which generate randomized gauge fluxes as perceived by the charge degree of freedom. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 6+6 pages, 6 figures

arXiv:2406.15507 [pdf, other]

Few-shot Knowledge Graph Relational Reasoning via Subgraph Adaptation

Authors: Haochen Liu, Song Wang, Chen Chen, Jundong Li

Abstract: Few-shot Knowledge Graph (KG) Relational Reasoning aims to predict unseen triplets (i.e., query triplets) for rare relations in KGs, given only several triplets of these relations as references (i.e., support triplets). This task has gained significant traction due to the widespread use of knowledge graphs in various natural language processing applications. Previous approaches have utilized meta-… ▽ More Few-shot Knowledge Graph (KG) Relational Reasoning aims to predict unseen triplets (i.e., query triplets) for rare relations in KGs, given only several triplets of these relations as references (i.e., support triplets). This task has gained significant traction due to the widespread use of knowledge graphs in various natural language processing applications. Previous approaches have utilized meta-training methods and manually constructed meta-relation sets to tackle this task. Recent efforts have focused on edge-mask-based methods, which exploit the structure of the contextualized graphs of target triplets (i.e., a subgraph containing relevant triplets in the KG). However, existing edge-mask-based methods have limitations in extracting insufficient information from KG and are highly influenced by spurious information in KG. To overcome these challenges, we propose SAFER (Subgraph Adaptation for Few-shot Relational Reasoning), a novel approach that effectively adapts the information in contextualized graphs to various subgraphs generated from support and query triplets to perform the prediction. Specifically, SAFER enables the extraction of more comprehensive information from support triplets while minimizing the impact of spurious information when predicting query triplets. Experimental results on three prevalent datasets demonstrate the superiority of our proposed framework SAFER. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.15486 [pdf, other]

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Authors: Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang

Abstract: Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for nea… ▽ More Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for near-lossless sparse attention. We find dynamically capturing head-specific sparse patterns at runtime with low overhead is crucial. To address this, we propose SampleAttention, an adaptive structured and near-lossless sparse attention. Leveraging observed significant sparse patterns, SampleAttention attends to a fixed percentage of adjacent tokens to capture local window patterns, and employs a two-stage query-guided key-value filtering approach, which adaptively select a minimum set of key-values with low overhead, to capture column stripe patterns. Comprehensive evaluations show that SampleAttention can seamlessly replace vanilla attention in off-the-shelf LLMs with nearly no accuracy loss, and reduces TTFT by up to $2.42\times$ compared with FlashAttention. △ Less

Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.15396 [pdf, other]

Feature Purified Transformer With Cross-level Feature Guiding Decoder For Multi-class OOD and Anomaly Deteciton

Authors: Jerry Chun-Wei Lin, Pi-Wei Chen, Chao-Chun Chen

Abstract: Reconstruction networks are prevalently used in unsupervised anomaly and Out-of-Distribution (OOD) detection due to their independence from labeled anomaly data. However, in multi-class datasets, the effectiveness of anomaly detection is often compromised by the models' generalized reconstruction capabilities, which allow anomalies to blend within the expanded boundaries of normality resulting fro… ▽ More Reconstruction networks are prevalently used in unsupervised anomaly and Out-of-Distribution (OOD) detection due to their independence from labeled anomaly data. However, in multi-class datasets, the effectiveness of anomaly detection is often compromised by the models' generalized reconstruction capabilities, which allow anomalies to blend within the expanded boundaries of normality resulting from the added categories, thereby reducing detection accuracy. We introduce the FUTUREG framework, which incorporates two innovative modules: the Feature Purification Module (FPM) and the CFG Decoder. The FPM constrains the normality boundary within the latent space to effectively filter out anomalous features, while the CFG Decoder uses layer-wise encoder representations to guide the reconstruction of filtered features, preserving fine-grained details. Together, these modules enhance the reconstruction error for anomalies, ensuring high-quality reconstructions for normal samples. Our results demonstrate that FUTUREG achieves state-of-the-art performance in multi-class OOD settings and remains competitive in industrial anomaly detection scenarios. △ Less

Submitted 30 April, 2024; originally announced June 2024.

Comments: 12 pages

arXiv:2406.15030 [pdf, ps, other]

Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction… ▽ More Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 11 pages, 3 figures

arXiv:2406.14900 [pdf, other]

Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation

Authors: Keqin Bao, Jizhi Zhang, Yang Zhang, Xinyue Huo, Chong Chen, Fuli Feng

Abstract: Adapting Large Language Models (LLMs) for recommendation requires careful consideration of the decoding process, given the inherent differences between generating items and natural language. Existing approaches often directly apply LLMs' original decoding methods. However, we find these methods encounter significant challenges: 1) amplification bias -- where standard length normalization inflates… ▽ More Adapting Large Language Models (LLMs) for recommendation requires careful consideration of the decoding process, given the inherent differences between generating items and natural language. Existing approaches often directly apply LLMs' original decoding methods. However, we find these methods encounter significant challenges: 1) amplification bias -- where standard length normalization inflates scores for items containing tokens with generation probabilities close to 1 (termed ghost tokens), and 2) homogeneity issue -- generating multiple similar or repetitive items for a user. To tackle these challenges, we introduce a new decoding approach named Debiasing-Diversifying Decoding (D3). D3 disables length normalization for ghost tokens to alleviate amplification bias, and it incorporates a text-free assistant model to encourage tokens less frequently generated by LLMs for counteracting recommendation homogeneity. Extensive experiments on real-world datasets demonstrate the method's effectiveness in enhancing accuracy and diversity. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14319 [pdf, other]

LiveMind: Low-latency Large Language Models with Simultaneous Inference

Authors: Chuangtao Chen, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, Bing Li

Abstract: In this paper, we introduce a novel low-latency inference framework for large language models (LLMs) inference which enables LLMs to perform inferences with incomplete prompts. By reallocating computational processes to prompt input phase, we achieve a substantial reduction in latency, thereby significantly enhancing the interactive experience for users of LLMs. The framework adeptly manages the v… ▽ More In this paper, we introduce a novel low-latency inference framework for large language models (LLMs) inference which enables LLMs to perform inferences with incomplete prompts. By reallocating computational processes to prompt input phase, we achieve a substantial reduction in latency, thereby significantly enhancing the interactive experience for users of LLMs. The framework adeptly manages the visibility of the streaming prompt to the model, allowing it to infer from incomplete prompts or await additional prompts. Compared with traditional inference methods that utilize complete prompts, our approach demonstrates an average reduction of 59% in response latency on the MMLU-Pro dataset, while maintaining comparable accuracy. Additionally, our framework facilitates collaborative inference and output across different models. By employing an LLM for inference and a small language model (SLM) for output, we achieve an average 68% reduction in response latency, alongside a 5.5% improvement in accuracy on the MMLU-Pro dataset compared with the SLM baseline. For long prompts exceeding 20 sentences, the response latency can be reduced by up to 93%. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14130 [pdf, other]

ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning

Authors: Zhongjie Duan, Wenmeng Zhou, Cen Chen, Yaliang Li, Weining Qian

Abstract: Recently, advancements in video synthesis have attracted significant attention. Video synthesis models such as AnimateDiff and Stable Video Diffusion have demonstrated the practical applicability of diffusion models in creating dynamic visual content. The emergence of SORA has further spotlighted the potential of video generation technologies. Nonetheless, the extension of video lengths has been c… ▽ More Recently, advancements in video synthesis have attracted significant attention. Video synthesis models such as AnimateDiff and Stable Video Diffusion have demonstrated the practical applicability of diffusion models in creating dynamic visual content. The emergence of SORA has further spotlighted the potential of video generation technologies. Nonetheless, the extension of video lengths has been constrained by the limitations in computational resources. Most existing video synthesis models can only generate short video clips. In this paper, we propose a novel post-tuning methodology for video synthesis models, called ExVideo. This approach is designed to enhance the capability of current video synthesis models, allowing them to produce content over extended temporal durations while incurring lower training expenditures. In particular, we design extension strategies across common temporal model architectures respectively, including 3D convolution, temporal attention, and positional embedding. To evaluate the efficacy of our proposed post-tuning approach, we conduct extension training on the Stable Video Diffusion model. Our approach augments the model's capacity to generate up to $5\times$ its original number of frames, requiring only 1.5k GPU hours of training on a dataset comprising 40k videos. Importantly, the substantial increase in video length doesn't compromise the model's innate generalization capabilities, and the model showcases its advantages in generating videos of diverse styles and resolutions. We will release the source code and the enhanced model publicly. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 8 pages, 5 figures

arXiv:2406.14123 [pdf]

Mapping AI Ethics Narratives: Evidence from Twitter Discourse Between 2015 and 2022

Authors: Mengyi Wei, Puzhen Zhang, Chuan Chen, Dongsheng Chen, Chenyu Zuo, Liqiu Meng

Abstract: Public participation is indispensable for an insightful understanding of the ethics issues raised by AI technologies. Twitter is selected in this paper to serve as an online public sphere for exploring discourse on AI ethics, facilitating broad and equitable public engagement in the development of AI technology. A research framework is proposed to demonstrate how to transform AI ethics-related dis… ▽ More Public participation is indispensable for an insightful understanding of the ethics issues raised by AI technologies. Twitter is selected in this paper to serve as an online public sphere for exploring discourse on AI ethics, facilitating broad and equitable public engagement in the development of AI technology. A research framework is proposed to demonstrate how to transform AI ethics-related discourse on Twitter into coherent and readable narratives. It consists of two parts: 1) combining neural networks with large language models to construct a topic hierarchy that contains popular topics of public concern without ignoring small but important voices, thus allowing a fine-grained exploration of meaningful information. 2) transforming fragmented and difficult-to-understand social media information into coherent and easy-to-read stories through narrative visualization, providing a new perspective for understanding the information in Twitter data. This paper aims to advocate for policy makers to enhance public oversight of AI technologies so as to promote their fair and sustainable development. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 22 pages, 6 figures

arXiv:2406.13933 [pdf, other]

EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations

Authors: Jie Ren, Yingqian Cui, Chen Chen, Vikash Sehwag, Yue Xing, Jiliang Tang, Lingjuan Lyu

Abstract: Generative models, especially text-to-image diffusion models, have significantly advanced in their ability to generate images, benefiting from enhanced architectures, increased computational power, and large-scale datasets. While the datasets play an important role, their protection has remained as an unsolved issue. Current protection strategies, such as watermarks and membership inference, are e… ▽ More Generative models, especially text-to-image diffusion models, have significantly advanced in their ability to generate images, benefiting from enhanced architectures, increased computational power, and large-scale datasets. While the datasets play an important role, their protection has remained as an unsolved issue. Current protection strategies, such as watermarks and membership inference, are either in high poison rate which is detrimental to image quality or suffer from low accuracy and robustness. In this work, we introduce a novel approach, EnTruth, which Enhances Traceability of unauthorized dataset usage utilizing template memorization. By strategically incorporating the template memorization, EnTruth can trigger the specific behavior in unauthorized models as the evidence of infringement. Our method is the first to investigate the positive application of memorization and use it for copyright protection, which turns a curse into a blessing and offers a pioneering perspective for unauthorized usage detection in generative models. Comprehensive experiments are provided to demonstrate its effectiveness in terms of data-alteration rate, accuracy, robustness and generation quality. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13925 [pdf, other]

GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models

Authors: Tao Zhang, Ziqian Zeng, Yuxiang Xiao, Huiping Zhuang, Cen Chen, James Foulds, Shimei Pan

Abstract: Large Language Models (LLMs) are prone to generating content that exhibits gender biases, raising significant ethical concerns. Alignment, the process of fine-tuning LLMs to better align with desired behaviors, is recognized as an effective approach to mitigate gender biases. Although proprietary LLMs have made significant strides in mitigating gender bias, their alignment datasets are not publicl… ▽ More Large Language Models (LLMs) are prone to generating content that exhibits gender biases, raising significant ethical concerns. Alignment, the process of fine-tuning LLMs to better align with desired behaviors, is recognized as an effective approach to mitigate gender biases. Although proprietary LLMs have made significant strides in mitigating gender bias, their alignment datasets are not publicly available. The commonly used and publicly available alignment dataset, HH-RLHF, still exhibits gender bias to some extent. There is a lack of publicly available alignment datasets specifically designed to address gender bias. Hence, we developed a new dataset named GenderAlign, aiming at mitigating a comprehensive set of gender biases in LLMs. This dataset comprises 8k single-turn dialogues, each paired with a "chosen" and a "rejected" response. Compared to the "rejected" responses, the "chosen" responses demonstrate lower levels of gender bias and higher quality. Furthermore, we categorized the gender biases in the "rejected" responses of GenderAlign into 4 principal categories. The experimental results show the effectiveness of GenderAlign in reducing gender bias in LLMs. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13782 [pdf, other]

Clock-line-mediated Sisyphus Cooling

Authors: Chun-Chia Chen, Jacob L. Siegel, Benjamin D. Hunt, Tanner Grogan, Youssef S. Hassan, Kyle Beloy, Kurt Gibble, Roger C. Brown, Andrew D. Ludlow

Abstract: We demonstrate sub-recoil Sisyphus cooling using the long-lived $^{3}\mathrm{P}_{0}$ clock state in alkaline-earth-like ytterbium. A 1388 nm optical standing wave nearly resonant with the $^{3}\textrm{P}_{0}$$\,\rightarrow$$\,^{3}\textrm{D}_{1}$ transition creates a spatially periodic light shift of the $^{3}\textrm{P}_{0}$ clock state. Following excitation on the ultranarrow clock transition, we… ▽ More We demonstrate sub-recoil Sisyphus cooling using the long-lived $^{3}\mathrm{P}_{0}$ clock state in alkaline-earth-like ytterbium. A 1388 nm optical standing wave nearly resonant with the $^{3}\textrm{P}_{0}$$\,\rightarrow$$\,^{3}\textrm{D}_{1}$ transition creates a spatially periodic light shift of the $^{3}\textrm{P}_{0}$ clock state. Following excitation on the ultranarrow clock transition, we observe Sisyphus cooling in this potential, as the light shift is correlated with excitation to $^{3}\textrm{D}_{1}$ and subsequent spontaneous decay to the $^{1}\textrm{S}_{0}$ ground state. We observe that cooling enhances the loading efficiency of atoms into a 759 nm magic-wavelength one-dimensional (1D) optical lattice, as compared to standard Doppler cooling on the $^{1}\textrm{S}_{0}$$\,\rightarrow\,$$^{3}\textrm{P}_{1}$ transition. Sisyphus cooling yields temperatures below 200 nK in the weakly confined, transverse dimensions of the 1D optical lattice. These lower temperatures improve optical lattice clocks by facilitating the use of shallow lattices with reduced light shifts, while retaining large atom numbers to reduce the quantum projection noise. This Sisyphus cooling can be pulsed or continuous and is applicable to a range of quantum metrology applications. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 8 pages, 6 figures

arXiv:2406.13780 [pdf, ps, other]

On the maximum $F$-free induced subgraphs in $K_t$-free graphs

Authors: József Balogh, Ce Chen, Haoran Luo

Abstract: For graphs $F$ and $H$, let $f_{F,H}(n)$ be the minimum possible size of a maximum $F$-free induced subgraph in an $n$-vertex $H$-free graph. This notion generalizes the Ramsey function and the Erdős--Rogers function. Establishing a container lemma for the $F$-free subgraphs, we give a general upper bound on $f_{F,H}(n)$, assuming the existence of certain locally dense $H$-free graphs. In particul… ▽ More For graphs $F$ and $H$, let $f_{F,H}(n)$ be the minimum possible size of a maximum $F$-free induced subgraph in an $n$-vertex $H$-free graph. This notion generalizes the Ramsey function and the Erdős--Rogers function. Establishing a container lemma for the $F$-free subgraphs, we give a general upper bound on $f_{F,H}(n)$, assuming the existence of certain locally dense $H$-free graphs. In particular, we prove that for every graph $F$ with $\mathrm{ex}(m,F) = O(m^{1+α})$, where $α\in [0,1/2)$, we have \[ f_{F, K_3}(n) = O\left(n^{\frac{1}{2-α}}\left(\log n\right)^{\frac{3}{2- α}}\right) \quad \textrm{and} \quad f_{F, K_4}(n) = O\left(n^{\frac{1}{3-2α}}\left(\log n\right)^{\frac{6}{3-2α}}\right). \] For the cases where $F$ is a complete multipartite graph, letting $s = \sum_{i=1}^r s_i$, we prove that \[ f_{K_{s_1,\ldots,s_r}, K_{r+2}}(n) = O \left( n^{\frac{2s -3}{4s -5}} (\log n)^{3} \right). \] We also make an observation which improves the bounds of $\mathrm{ex}(G(n,p),C_4)$ by a polylogarithmic factor. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 14 pages

MSC Class: 05D10 05C55 05C35 05C80

arXiv:2406.13603 [pdf, ps, other]

Formation of a Magnetic Cloud from the Merging of Two Successive Coronal Mass Ejections

Authors: Chong Chen, Ying D. Liu, Bei Zhu, Huidong Hu, Rui Wang

Abstract: On 2022 March 28 two successive coronal mass ejections (CMEs) were observed by multiple spacecraft and resulted in a magnetic cloud (MC) at 1 AU. We investigate the propagation and interaction properties of the two CMEs correlated with the MC using coordinated multi-point remote sensing and in situ observations from Solar Orbiter, STEREO A, SOHO, and Wind. The first CME was triggered by a filament… ▽ More On 2022 March 28 two successive coronal mass ejections (CMEs) were observed by multiple spacecraft and resulted in a magnetic cloud (MC) at 1 AU. We investigate the propagation and interaction properties of the two CMEs correlated with the MC using coordinated multi-point remote sensing and in situ observations from Solar Orbiter, STEREO A, SOHO, and Wind. The first CME was triggered by a filament eruption with a high inclination angle. Roughly 9 hr later, the second CME originating from the same active region erupted with a smaller tilt angle and faster speed compared to the first one. The second CME overtook the preceding CME and formed a merged front at approximately 75 \rsun{}, which developed into a complex ejecta at 1 AU. The descending speed and low proton temperature inside the complex ejecta suggest that the two CMEs have fully merged before reaching 1 AU, leading them to begin expanding rather than compressing against each other. The complex ejecta appears to have the magnetic field and plasma signatures of an MC, although there is a discontinuity in the magnetic field implying previous interactions. The cross section of the complex ejecta, reconstructed from in situ data using a Grad-Shafranov technique, exhibits a right--handed flux rope structure. These results highlight that an MC--like complex ejecta lacking interaction features could arise from the complete merging of two CMEs. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12779 [pdf, other]

Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition

Authors: Xingming Liao, Nankai Lin, Haowen Li, Lianglun Cheng, Zhuowei Wang, Chong Chen

Abstract: Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of… ▽ More Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of nested entities in NNER, existing data augmentation methods cannot be directly applied to NNER tasks. Therefore, in this work, we focus on data augmentation for NNER and resort to more expressive structures, Composited-Nested-Label Classification (CNLC) in which constituents are combined by nested-word and nested-label, to model nested entities. The dataset is augmented using the Composited-Nested-Learning (CNL). In addition, we propose the Confidence Filtering Mechanism (CFM) for a more efficient selection of generated data. Experimental results demonstrate that this approach results in improvements in ACE2004 and ACE2005 and alleviates the impact of sample imbalance. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted by CSCWD 2024

Showing 101–150 of 7,020 results for author: Chen, C