-
SMILES-Mamba: Chemical Mamba Foundation Models for Drug ADMET Prediction
Authors:
Bohao Xu,
Yingzhou Lu,
Chenhao Li,
Ling Yue,
Xiao Wang,
Nan Hao,
Tianfan Fu,
Jim Chen
Abstract:
In drug discovery, predicting the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of small-molecule drugs is critical for ensuring safety and efficacy. However, the process of accurately predicting these properties is often resource-intensive and requires extensive experimental data. To address this challenge, we propose SMILES-Mamba, a two-stage model that leverag…
▽ More
In drug discovery, predicting the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of small-molecule drugs is critical for ensuring safety and efficacy. However, the process of accurately predicting these properties is often resource-intensive and requires extensive experimental data. To address this challenge, we propose SMILES-Mamba, a two-stage model that leverages both unlabeled and labeled data through a combination of self-supervised pretraining and fine-tuning strategies. The model first pre-trains on a large corpus of unlabeled SMILES strings to capture the underlying chemical structure and relationships, before being fine-tuned on smaller, labeled datasets specific to ADMET tasks. Our results demonstrate that SMILES-Mamba exhibits competitive performance across 22 ADMET datasets, achieving the highest score in 14 tasks, highlighting the potential of self-supervised learning in improving molecular property prediction. This approach not only enhances prediction accuracy but also reduces the dependence on large, labeled datasets, offering a promising direction for future research in drug discovery.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
In-Context Exploiter for Extensive-Form Games
Authors:
Shuxin Li,
Chang Yang,
Youzhi Zhang,
Pengdeng Li,
Xinrun Wang,
Xiao Huang,
Hau Chan,
Bo An
Abstract:
Nash equilibrium (NE) is a widely adopted solution concept in game theory due to its stability property. However, we observe that the NE strategy might not always yield the best results, especially against opponents who do not adhere to NE strategies. Based on this observation, we pose a new game-solving question: Can we learn a model that can exploit any, even NE, opponent to maximize their own u…
▽ More
Nash equilibrium (NE) is a widely adopted solution concept in game theory due to its stability property. However, we observe that the NE strategy might not always yield the best results, especially against opponents who do not adhere to NE strategies. Based on this observation, we pose a new game-solving question: Can we learn a model that can exploit any, even NE, opponent to maximize their own utility? In this work, we make the first attempt to investigate this problem through in-context learning. Specifically, we introduce a novel method, In-Context Exploiter (ICE), to train a single model that can act as any player in the game and adaptively exploit opponents entirely by in-context learning. Our ICE algorithm involves generating diverse opponent strategies, collecting interactive history training data by a reinforcement learning algorithm, and training a transformer-based agent within a well-designed curriculum learning framework. Finally, comprehensive experimental results validate the effectiveness of our ICE algorithm, showcasing its in-context learning ability to exploit any unknown opponent, thereby positively answering our initial game-solving question.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs
Authors:
Kexin Ma,
Ruochun Jin,
Xi Wang,
Huan Chen,
Jing Ren,
Yuhua Tang
Abstract:
Retrieval-Augmented Large Language Models (RALMs) have made significant strides in enhancing the accuracy of generated responses.However, existing research often overlooks the data quality issues within retrieval results, often caused by inaccurate existing vector-distance-based retrieval methods.We propose to boost the precision of RALMs' answers from a data quality perspective through the Contex…
▽ More
Retrieval-Augmented Large Language Models (RALMs) have made significant strides in enhancing the accuracy of generated responses.However, existing research often overlooks the data quality issues within retrieval results, often caused by inaccurate existing vector-distance-based retrieval methods.We propose to boost the precision of RALMs' answers from a data quality perspective through the Context-Driven Index Trimming (CDIT) framework, where Context Matching Dependencies (CMDs) are employed as logical data quality rules to capture and regulate the consistency between retrieved contexts.Based on the semantic comprehension capabilities of Large Language Models (LLMs), CDIT can effectively identify and discard retrieval results that are inconsistent with the query context and further modify indexes in the database, thereby improving answer quality.Experiments demonstrate on challenging question-answering tasks.Also, the flexibility of CDIT is verified through its compatibility with various language models and indexing methods, which offers a promising approach to bolster RALMs' data quality and retrieval precision jointly.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning
Authors:
Yuze Zhao,
Jintao Huang,
Jinghan Hu,
Xingjun Wang,
Yunlin Mao,
Daoze Zhang,
Zeyinzi Jiang,
Zhikai Wu,
Baole Ai,
Ang Wang,
Wenmeng Zhou,
Yingda Chen
Abstract:
Recent development in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have leverage Attention-based Transformer architectures and achieved superior performance and generalization capabilities. They have since covered extensive areas of traditional learning tasks. For instance, text-based tasks such as text-classification and sequence-labeling, as well as multi-modal task…
▽ More
Recent development in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have leverage Attention-based Transformer architectures and achieved superior performance and generalization capabilities. They have since covered extensive areas of traditional learning tasks. For instance, text-based tasks such as text-classification and sequence-labeling, as well as multi-modal tasks like Visual Question Answering (VQA) and Optical Character Recognition (OCR), which were previously addressed using different models, can now be tackled based on one foundation model. Consequently, the training and lightweight fine-tuning of LLMs and MLLMs, especially those based on Transformer architecture, has become particularly important. In recognition of these overwhelming needs, we develop SWIFT, a customizable one-stop infrastructure for large models. With support of over $300+$ LLMs and $50+$ MLLMs, SWIFT stands as the open-source framework that provide the most comprehensive support for fine-tuning large models. In particular, it is the first training framework that provides systematic support for MLLMs. In addition to the core functionalities of fine-tuning, SWIFT also integrates post-training processes such as inference, evaluation, and model quantization, to facilitate fast adoptions of large models in various application scenarios. With a systematic integration of various training techniques, SWIFT offers helpful utilities such as benchmark comparisons among different training techniques for large models. For fine-tuning models specialized in agent framework, we show that notable improvements on the ToolBench leader-board can be achieved by training with customized dataset on SWIFT, with an increase of 5.2%-21.8% in the Act.EM metric over various baseline models, a reduction in hallucination by 1.6%-14.1%, and an average performance improvement of 8%-17%.
△ Less
Submitted 18 August, 2024; v1 submitted 10 August, 2024;
originally announced August 2024.
-
Trajectory Planning for Teleoperated Space Manipulators Using Deep Reinforcement Learning
Authors:
Bo Xia,
Xianru Tian,
Bo Yuan,
Zhiheng Li,
Bin Liang,
Xueqian Wang
Abstract:
Trajectory planning for teleoperated space manipulators involves challenges such as accurately modeling system dynamics, particularly in free-floating modes with non-holonomic constraints, and managing time delays that increase model uncertainty and affect control precision. Traditional teleoperation methods rely on precise dynamic models requiring complex parameter identification and calibration,…
▽ More
Trajectory planning for teleoperated space manipulators involves challenges such as accurately modeling system dynamics, particularly in free-floating modes with non-holonomic constraints, and managing time delays that increase model uncertainty and affect control precision. Traditional teleoperation methods rely on precise dynamic models requiring complex parameter identification and calibration, while data-driven methods do not require prior knowledge but struggle with time delays. A novel framework utilizing deep reinforcement learning (DRL) is introduced to address these challenges. The framework incorporates three methods: Mapping, Prediction, and State Augmentation, to handle delays when delayed state information is received at the master end. The Soft Actor Critic (SAC) algorithm processes the state information to compute the next action, which is then sent to the remote manipulator for environmental interaction. Four environments are constructed using the MuJoCo simulation platform to account for variations in base and target fixation: fixed base and target, fixed base with rotated target, free-floating base with fixed target, and free-floating base with rotated target. Extensive experiments with both constant and random delays are conducted to evaluate the proposed methods. Results demonstrate that all three methods effectively address trajectory planning challenges, with State Augmentation showing superior efficiency and robustness.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou
Authors:
Xu Wang,
Jiangxia Cao,
Zhiyi Fu,
Kun Gai,
Guorui Zhou
Abstract:
In this paper, we present the practical problems and the lessons learned at short-video services from Kuaishou. In industry, a widely-used multi-task framework is the Mixture-of-Experts (MoE) paradigm, which always introduces some shared and specific experts for each task and then uses gate networks to measure related experts' contributions. Although the MoE achieves remarkable improvements, we st…
▽ More
In this paper, we present the practical problems and the lessons learned at short-video services from Kuaishou. In industry, a widely-used multi-task framework is the Mixture-of-Experts (MoE) paradigm, which always introduces some shared and specific experts for each task and then uses gate networks to measure related experts' contributions. Although the MoE achieves remarkable improvements, we still observe three anomalies that seriously affect model performances in our iteration: (1) Expert Collapse: We found that experts' output distributions are significantly different, and some experts have over 90% zero activations with ReLU, making it hard for gate networks to assign fair weights to balance experts. (2) Expert Degradation: Ideally, the shared-expert aims to provide predictive information for all tasks simultaneously. Nevertheless, we find that some shared-experts are occupied by only one task, which indicates that shared-experts lost their ability but degenerated into some specific-experts. (3) Expert Underfitting: In our services, we have dozens of behavior tasks that need to be predicted, but we find that some data-sparse prediction tasks tend to ignore their specific-experts and assign large weights to shared-experts. The reason might be that the shared-experts can perceive more gradient updates and knowledge from dense tasks, while specific-experts easily fall into underfitting due to their sparse behaviors. Motivated by those observations, we propose HoME to achieve a simple, efficient and balanced MoE system for multi-task learning.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
PersonViT: Large-scale Self-supervised Vision Transformer for Person Re-Identification
Authors:
Bin Hu,
Xinggang Wang,
Wenyu Liu
Abstract:
Person Re-Identification (ReID) aims to retrieve relevant individuals in non-overlapping camera images and has a wide range of applications in the field of public safety. In recent years, with the development of Vision Transformer (ViT) and self-supervised learning techniques, the performance of person ReID based on self-supervised pre-training has been greatly improved. Person ReID requires extra…
▽ More
Person Re-Identification (ReID) aims to retrieve relevant individuals in non-overlapping camera images and has a wide range of applications in the field of public safety. In recent years, with the development of Vision Transformer (ViT) and self-supervised learning techniques, the performance of person ReID based on self-supervised pre-training has been greatly improved. Person ReID requires extracting highly discriminative local fine-grained features of the human body, while traditional ViT is good at extracting context-related global features, making it difficult to focus on local human body features. To this end, this article introduces the recently emerged Masked Image Modeling (MIM) self-supervised learning method into person ReID, and effectively extracts high-quality global and local features through large-scale unsupervised pre-training by combining masked image modeling and discriminative contrastive learning, and then conducts supervised fine-tuning training in the person ReID task. This person feature extraction method based on ViT with masked image modeling (PersonViT) has the good characteristics of unsupervised, scalable, and strong generalization capabilities, overcoming the problem of difficult annotation in supervised person ReID, and achieves state-of-the-art results on publicly available benchmark datasets, including MSMT17, Market1501, DukeMTMC-reID, and Occluded-Duke. The code and pre-trained models of the PersonViT method are released at \url{https://github.com/hustvl/PersonViT} to promote further research in the person ReID field.
△ Less
Submitted 20 August, 2024; v1 submitted 9 August, 2024;
originally announced August 2024.
-
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Authors:
Chaoyou Fu,
Haojia Lin,
Zuwei Long,
Yunhang Shen,
Meng Zhao,
Yifan Zhang,
Xiong Wang,
Di Yin,
Long Ma,
Xiawu Zheng,
Ran He,
Rongrong Ji,
Yunsheng Wu,
Caifeng Shan,
Xing Sun
Abstract:
The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advance…
▽ More
The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advanced multimodal interactive experience. Starting from Mixtral 8x7B as a language foundation, we expand its Chinese vocabulary followed by bilingual instruction tuning. We further endow the language model with visual and audio capabilities through two-stage multi-task learning of multimodal alignment and instruction tuning. VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding, as evidenced by its strong performance across a range of both unimodal and multimodal benchmarks. Beyond foundational capabilities, we have made considerable progress in enhancing the natural multimodal human-computer interaction experience. To the best of our knowledge, we are the first to exploit non-awakening interaction and audio interrupt in MLLM. VITA is the first step for the open-source community to explore the seamless integration of multimodal understanding and interaction. While there is still lots of work to be done on VITA to get close to close-source counterparts, we hope that its role as a pioneer can serve as a cornerstone for subsequent research. Project Page: https://vita-home.github.io.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Observation of muonic Dalitz decays of $χ_{b}$ mesons and precise spectroscopy of hidden-beauty states
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1114 additional authors not shown)
Abstract:
The decays of the $χ_{b1}(1P)$, $χ_{b2}(1P)$, $χ_{b1}(2P)$ and $χ_{b2}(2P)$~mesons into the~$Υ(1S)μ^+μ^-$ final state are observed with a high significance using proton-proton collision data collected with the LHCb detector and corresponding to an integrated luminosity of 9fb$^{-1}$. The newly observed decays together with the $Υ(2S)\rightarrow Υ(1S)π^+π^-$ and $Υ(3S)\rightarrow Υ(2S)π^+π^-$ decay…
▽ More
The decays of the $χ_{b1}(1P)$, $χ_{b2}(1P)$, $χ_{b1}(2P)$ and $χ_{b2}(2P)$~mesons into the~$Υ(1S)μ^+μ^-$ final state are observed with a high significance using proton-proton collision data collected with the LHCb detector and corresponding to an integrated luminosity of 9fb$^{-1}$. The newly observed decays together with the $Υ(2S)\rightarrow Υ(1S)π^+π^-$ and $Υ(3S)\rightarrow Υ(2S)π^+π^-$ decay modes are used for precision measurements of the mass and mass splittings for the hidden-beauty states.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Authors:
Mengcheng Lan,
Chaofeng Chen,
Yiping Ke,
Xinjiang Wang,
Litong Feng,
Wayne Zhang
Abstract:
Open-vocabulary semantic segmentation requires models to effectively integrate visual representations with open-vocabulary semantic labels. While Contrastive Language-Image Pre-training (CLIP) models shine in recognizing visual concepts from text, they often struggle with segment coherence due to their limited localization ability. In contrast, Vision Foundation Models (VFMs) excel at acquiring sp…
▽ More
Open-vocabulary semantic segmentation requires models to effectively integrate visual representations with open-vocabulary semantic labels. While Contrastive Language-Image Pre-training (CLIP) models shine in recognizing visual concepts from text, they often struggle with segment coherence due to their limited localization ability. In contrast, Vision Foundation Models (VFMs) excel at acquiring spatially consistent local visual representations, yet they fall short in semantic understanding. This paper introduces ProxyCLIP, an innovative framework designed to harmonize the strengths of both CLIP and VFMs, facilitating enhanced open-vocabulary semantic segmentation. ProxyCLIP leverages the spatial feature correspondence from VFMs as a form of proxy attention to augment CLIP, thereby inheriting the VFMs' robust local consistency and maintaining CLIP's exceptional zero-shot transfer capacity. We propose an adaptive normalization and masking strategy to get the proxy attention from VFMs, allowing for adaptation across different VFMs. Remarkably, as a training-free approach, ProxyCLIP significantly improves the average mean Intersection over Union (mIoU) across eight benchmarks from 40.3 to 44.4, showcasing its exceptional efficacy in bridging the gap between spatial precision and semantic richness for the open-vocabulary segmentation task.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Automation Configuration in Smart Home Systems: Challenges and Opportunities
Authors:
Sheik Murad Hassan Anik,
Xinghua Gao,
Hao Zhong,
Xiaoyin Wang,
Na Meng
Abstract:
As the innovation of smart devices and internet-of-things (IoT), smart homes have become prevalent. People tend to transform residences into smart homes by customizing off-the-shelf smart home platforms, instead of creating IoT systems from scratch. Among the alternatives, Home Assistant (HA) is one of the most popular platforms. It allows end-users (i.e., home residents) to smartify homes by (S1)…
▽ More
As the innovation of smart devices and internet-of-things (IoT), smart homes have become prevalent. People tend to transform residences into smart homes by customizing off-the-shelf smart home platforms, instead of creating IoT systems from scratch. Among the alternatives, Home Assistant (HA) is one of the most popular platforms. It allows end-users (i.e., home residents) to smartify homes by (S1) integrating selected devices into the system, and (S2) creating YAML files to control those devices. Unfortunately, due to the diversity of devices and complexity of automatic configurations, many users have difficulty correctly creating YAML files. Consequently, their smart homes may not work as expected, causing frustration and concern in users.
This paper presents a novel study on issues of YAML-based automation configuration in smart homes (issues related to S2). We mined the online forum Home Assistant Community for discussion threads related to automation configuration. By manually inspecting 190 threads, we revealed 3 categories of concerns: implementation, optimization, and debugging. Under each category, we classified discussions based on the issue locations and technical concepts involved. Among debugging discussions, we further classified discussions based on users' resolution strategies; we also applied existing analysis tools to buggy YAML files, to assess the tool effectiveness. Our study reveals the common challenges faced by users and frequently applied resolution strategies. There are 129 (68%) examined issues concerning debugging, but existing tools can detect at most 14 issues and fix none. It implies that existing tools provide limited assistance in automation configuration. Our research sheds light on future directions in smart home development.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
New Structures in the $J/ψ$ $J/ψ$ Mass Spectrum at CMS
Authors:
Xining Wang,
Kai Yi
Abstract:
A search is reported for structures near the $J/ψ$ $J/ψ$ mass threshold using a dataset of proton-proton collisions at $\sqrt{s} \: =13 \: \mathrm{TeV} $ recorded with the CMS detector at the LHC, corresponding to an integrated luminosity of about $135 \: \mathrm{fb^{-1}}$. Two structures are observed with a significance exceeding $5σ$ and evidence of an additional structure is reported with a loc…
▽ More
A search is reported for structures near the $J/ψ$ $J/ψ$ mass threshold using a dataset of proton-proton collisions at $\sqrt{s} \: =13 \: \mathrm{TeV} $ recorded with the CMS detector at the LHC, corresponding to an integrated luminosity of about $135 \: \mathrm{fb^{-1}}$. Two structures are observed with a significance exceeding $5σ$ and evidence of an additional structure is reported with a local significance of $4.7σ$.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Analysis of the dynamics of the decay $D^{+}\to K_{S}^{0} π^{0} e^{+}ν_{e}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
The branching fraction of $D^+\to K_{S}^{0} π^{0}e^+ν_e$ is measured for the first time using $7.93~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$~GeV with the BESIII detector operating at the BEPCII collider, and is determined to be ${\mathcal B}$($D^+\to K_S^0π^0e^+ν_e$) = $(0.881~\pm~0.017_{\rm stat.}~\pm~0.016_{\rm syst.})$\%. Based on a…
▽ More
The branching fraction of $D^+\to K_{S}^{0} π^{0}e^+ν_e$ is measured for the first time using $7.93~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$~GeV with the BESIII detector operating at the BEPCII collider, and is determined to be ${\mathcal B}$($D^+\to K_S^0π^0e^+ν_e$) = $(0.881~\pm~0.017_{\rm stat.}~\pm~0.016_{\rm syst.})$\%. Based on an analysis of the $D^+\to K_S^0π^0e^+ν_e$ decay dynamics, we observe the $S\text{-}{\rm wave}$ and $P$-wave components with fractions of $f_{S\text{-}{\rm wave}}$ = $(6.13~\pm~0.27_{\rm stat.}~\pm ~0.30_{\rm syst.})\%$ and $f_{\bar K^{*}(892)^0}$ = $(93.88~\pm~0.27_{\rm stat.}~\pm~0.29_{\rm syst.})$\%, respectively. From these results, we obtain the branching fractions ${\mathcal B}$($D^+\to (K_S^0π^0)_{S\text{-}{\rm wave}}~e^+ν_e$) = $(5.41~\pm~0.35_{\rm stat.}~\pm~0.37_{\rm syst.})\times10^{-4}$ and ${\mathcal B}$($D^+\to \bar K^{*}(892)^0e^+ν_e$) = $(4.97~\pm~0.11_{\rm stat.}~\pm~0.12_{\rm syst.})$\%. In addition, the hadronic form-factor ratios of $D^{+} \to \bar {K}^{*}(892)^0e^+ν_e$ at $q^2=0$, assuming a single-pole dominance parameterization, are determined to be $r_V=\frac{V(0)}{A_1(0)}= 1.43~\pm~0.07_{\rm stat.}~\pm~0.03_{\rm syst.}$ and $r_2=\frac{A_2(0)}{A_1(0)}=0.72~\pm~0.06_{\rm stat.}~\pm~0.02_{\rm syst.}$.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
TupleChain: Fast Lookup of OpenFlow Table with Multifaceted Scalability
Authors:
Yanbiao Li,
Neng Ren,
Xin Wang,
Yuxuan Chen,
Xinyi Zhang,
Lingbo Guo,
Gaogang Xie
Abstract:
OpenFlow switches are fundamental components of software defined networking, where the key operation is to look up flow tables to determine which flow an incoming packet belongs to. This needs to address the same multi-field rule-matching problem as legacy packet classification, but faces more serious scalability challenges. The demand of fast on-line updates makes most existing solutions unfit, w…
▽ More
OpenFlow switches are fundamental components of software defined networking, where the key operation is to look up flow tables to determine which flow an incoming packet belongs to. This needs to address the same multi-field rule-matching problem as legacy packet classification, but faces more serious scalability challenges. The demand of fast on-line updates makes most existing solutions unfit, while the rest still lacks the scalability to either large data sets or large number of fields to match for a rule. In this work, we propose TupleChain for fast OpenFlow table lookup with multifaceted scalability. We group rules based on their masks, each being maintained with a hash table, and explore the connections among rule groups to skip unnecessary hash probes for fast search. We show via theoretical analysis and extensive experiments that the proposed scheme not only has competitive computing complexity, but is also scalable and can achieve high performance in both search and update. It can process multiple millions of packets per second, while dealing with millions of on-line updates per second at the same time, and its lookup speed maintains at the same level no mater it handles a large flow table with 10 million rules or a flow table with every entry having as many as 100 match fields.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
An Explainable Non-local Network for COVID-19 Diagnosis
Authors:
Jingfu Yang,
Peng Huang,
Jing Hu,
Shu Hu,
Siwei Lyu,
Xin Wang,
Jun Guo,
Xi Wu
Abstract:
The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The…
▽ More
The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The network is embedded with a nonlocal module to capture global information, while a 3D attention module is embedded to focus on the details of the lesion so that it can directly analyze the 3D lung CT and output the classification results. The output of the attention module can be used as a heat map to increase the interpretability of the model. 4079 3D CT scans were included in this study. Each scan had a unique label (novel coronavirus pneumonia, common pneumonia, and normal). The CT scans cohort was randomly split into a training set of 3263 scans, a validation set of 408 scans, and a testing set of 408 scans. And compare with existing mainstream classification methods, such as CovNet, CBAM, ResNet, etc. Simultaneously compare the visualization results with visualization methods such as CAM. Model performance was evaluated using the Area Under the ROC Curve(AUC), precision, and F1-score. The NL-RAN achieved the AUC of 0.9903, the precision of 0.9473, and the F1-score of 0.9462, surpass all the classification methods compared. The heat map output by the attention module is also clearer than the heat map output by CAM. Our experimental results indicate that our proposed method performs significantly better than existing methods. In addition, the first attention module outputs a heat map containing detailed outline information to increase the interpretability of the model. Our experiments indicate that the inference of our model is fast. It can provide real-time assistance with diagnosis.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Quantum-Enhanced Polarimetric Imaging
Authors:
Meng-Yu Xie,
Su-Jian Niu,
Zhao-Qi-Zhi Han,
Yin-Hai Li,
Ren-Hui Chen,
Xiao-Hua Wang,
Ming-Yuan Gao,
Li Chen,
Yue-Wei Song,
Zhi-Yuan Zhou,
Bao-Sen Shi
Abstract:
Polarimetric imaging, a technique that captures the invisible polarization-related properties of given materials, has broad applications from fundamental physics to advanced fields such as target recognition, stress detection, biomedical diagnosis and remote sensing. The introduction of quantum sources into classical imaging systems has demonstrated distinct advantages, yet few studies have explor…
▽ More
Polarimetric imaging, a technique that captures the invisible polarization-related properties of given materials, has broad applications from fundamental physics to advanced fields such as target recognition, stress detection, biomedical diagnosis and remote sensing. The introduction of quantum sources into classical imaging systems has demonstrated distinct advantages, yet few studies have explored their combination with polarimetric imaging. In this study, we present a quantum polarimetric imaging system that integrates polarization-entangled photon pairs into a polarizer-sample-compensator-analyzer (PSRA)-type polarimeter. Our system visualizes the birefringence properties of a periodical-distributed anisotropic material under decreasing illumination levels and diverse disturbing light sources. Compared to the classical system, the quantum approach reveals the superior sensitivity and robustness in low-light conditions, particularly useful in biomedical studies where the low illumination and non-destructive detection are urgently needed. The study also highlights the nonlocality of entangled photons in birefringence measurement, indicating the potential of quantum polarimetric system in the remote sensing domain.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
From Black Box to Clarity: AI-Powered Smart Grid Optimization with Kolmogorov-Arnold Networks
Authors:
Xiaoting Wang,
Yuzhuo Li,
Yunwei Li,
Gregory Kish
Abstract:
This work is the first to adopt Kolmogorov-Arnold Networks (KAN), a recent breakthrough in artificial intelligence, for smart grid optimizations. To fully leverage KAN's interpretability, a general framework is proposed considering complex uncertainties. The stochastic optimal power flow problem in hybrid AC/DC systems is chosen as a particularly tough case study for demonstrating the effectivenes…
▽ More
This work is the first to adopt Kolmogorov-Arnold Networks (KAN), a recent breakthrough in artificial intelligence, for smart grid optimizations. To fully leverage KAN's interpretability, a general framework is proposed considering complex uncertainties. The stochastic optimal power flow problem in hybrid AC/DC systems is chosen as a particularly tough case study for demonstrating the effectiveness of this framework.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Global-Local Progressive Integration Network for Blind Image Quality Assessment
Authors:
Xiaoqi Wang,
Yun Zhang
Abstract:
Vision transformers (ViTs) excel in computer vision for modeling long-term dependencies, yet face two key challenges for image quality assessment (IQA): discarding fine details during patch embedding, and requiring extensive training data due to lack of inductive biases. In this study, we propose a Global-Local progressive INTegration network for IQA, called GlintIQA, to address these issues throu…
▽ More
Vision transformers (ViTs) excel in computer vision for modeling long-term dependencies, yet face two key challenges for image quality assessment (IQA): discarding fine details during patch embedding, and requiring extensive training data due to lack of inductive biases. In this study, we propose a Global-Local progressive INTegration network for IQA, called GlintIQA, to address these issues through three key components: 1) Hybrid feature extraction combines ViT-based global feature extractor (VGFE) and convolutional neural networks (CNNs)-based local feature extractor (CLFE) to capture global coarse-grained features and local fine-grained features, respectively. The incorporation of CNNs mitigates the patch-level information loss and inductive bias constraints inherent to ViT architectures. 2) Progressive feature integration leverages diverse kernel sizes in embedding to spatially align coarse- and fine-grained features, and progressively aggregate these features by interactively stacking channel-wise attention and spatial enhancement modules to build effective quality-aware representations. 3) Content similarity-based labeling approach is proposed that automatically assigns quality labels to images with diverse content based on subjective quality scores. This addresses the scarcity of labeled training data in synthetic datasets and bolsters model generalization. The experimental results demonstrate the efficacy of our approach, yielding 5.04% average SROCC gains on cross-authentic dataset evaluations. Moreover, our model and its counterpart pre-trained on the proposed dataset respectively exhibited 5.40% and 13.23% improvements on across-synthetic datasets evaluation. The codes and proposed dataset will be released at https://github.com/XiaoqiWang/GlintIQA.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency
Authors:
Xijun Wang,
Dongshan Ye,
Chenyuan Feng,
Howard H. Yang,
Xiang Chen,
Tony Q. S. Quek
Abstract:
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. However, existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. To address these limitations, we propose a novel trustworthy ISC framework. This approach leverages text extraction a…
▽ More
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. However, existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. To address these limitations, we propose a novel trustworthy ISC framework. This approach leverages text extraction and segmentation mapping techniques to convert images into explainable semantics, while employing Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks. We also introduce a multi-rate ISC transmission protocol that dynamically adapts to both the received explainable semantic content and specific task requirements at the receiver. Simulation results demonstrate that our framework achieves explainable learning, decoupled training, and compatible transmission in various application scenarios. Finally, some intriguing research directions and application scenarios are identified.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis
Authors:
Zebin Yao,
Fangxiang Feng,
Ruifan Li,
Xiaojie Wang
Abstract:
The customization of text-to-image models has seen significant advancements, yet generating multiple personalized concepts remains a challenging task. Current methods struggle with attribute leakage and layout confusion when handling multiple concepts, leading to reduced concept fidelity and semantic consistency. In this work, we introduce a novel training-free framework, Concept Conductor, design…
▽ More
The customization of text-to-image models has seen significant advancements, yet generating multiple personalized concepts remains a challenging task. Current methods struggle with attribute leakage and layout confusion when handling multiple concepts, leading to reduced concept fidelity and semantic consistency. In this work, we introduce a novel training-free framework, Concept Conductor, designed to ensure visual fidelity and correct layout in multi-concept customization. Concept Conductor isolates the sampling processes of multiple custom models to prevent attribute leakage between different concepts and corrects erroneous layouts through self-attention-based spatial guidance. Additionally, we present a concept injection technique that employs shape-aware masks to specify the generation area for each concept. This technique injects the structure and appearance of personalized concepts through feature fusion in the attention layers, ensuring harmony in the final image. Extensive qualitative and quantitative experiments demonstrate that Concept Conductor can consistently generate composite images with accurate layouts while preserving the visual details of each concept. Compared to existing baselines, Concept Conductor shows significant performance improvements. Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts. The code and models are available at https://github.com/Nihukat/Concept-Conductor.
△ Less
Submitted 22 August, 2024; v1 submitted 7 August, 2024;
originally announced August 2024.
-
Measurement of the Branching Fraction of \boldmath{$ψ(2S) \to γπ^0$}
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Based on $(2712.4\pm14.1)\times10^{6}~ψ(2S)$ events, 7.9 fb$^{-1}$ $ψ(3773)$ data, and 0.8 fb$^{-1}$ off-resonance data samples collected with the BESIII detector, we measure the branching fraction of $ψ(2S)\rightarrowγπ^{0}$ and $e^{+}e^{-}\rightarrowγπ^{0}$ form factor at momentum transfers $Q^{2}\sim13$ GeV$^{2}$. The $e^{+}e^{-}\rightarrowγπ^{0}$ cross section is fitted with considering the in…
▽ More
Based on $(2712.4\pm14.1)\times10^{6}~ψ(2S)$ events, 7.9 fb$^{-1}$ $ψ(3773)$ data, and 0.8 fb$^{-1}$ off-resonance data samples collected with the BESIII detector, we measure the branching fraction of $ψ(2S)\rightarrowγπ^{0}$ and $e^{+}e^{-}\rightarrowγπ^{0}$ form factor at momentum transfers $Q^{2}\sim13$ GeV$^{2}$. The $e^{+}e^{-}\rightarrowγπ^{0}$ cross section is fitted with considering the interference between the $ψ(2S)$ and continuum amplitudes and two solutions are found, ${\cal B}=3.74\times10^{-7}$ with $φ=3.93$ rad and ${\cal B}=7.87\times10^{-7}$ with $φ=2.08$ rad. Here, ${\cal B}$ is the branching fraction of $ψ(2S)\rightarrowγπ^{0}$ and $φ$ is the relative phase angle between the $ψ(2S)$ and continuum amplitudes. Due to insufficient off-resonance data, the branching fraction ${\cal B}(ψ(2S)\rightarrowγπ^{0})$ is determined to be in the range $[2.7, 9.7]\times10^{-7}$ within one standard deviation of the contour region.
△ Less
Submitted 7 August, 2024; v1 submitted 7 August, 2024;
originally announced August 2024.
-
AI Foundation Models in Remote Sensing: A Survey
Authors:
Siqi Lu,
Junlin Guo,
James R Zimmer-Dauphinee,
Jordan M Nieusma,
Xiao Wang,
Parker VanValkenburgh,
Steven A Wernke,
Yuankai Huo
Abstract:
Artificial Intelligence (AI) technologies have profoundly transformed the field of remote sensing, revolutionizing data collection, processing, and analysis. Traditionally reliant on manual interpretation and task-specific models, remote sensing has been significantly enhanced by the advent of foundation models--large-scale, pre-trained AI models capable of performing a wide array of tasks with un…
▽ More
Artificial Intelligence (AI) technologies have profoundly transformed the field of remote sensing, revolutionizing data collection, processing, and analysis. Traditionally reliant on manual interpretation and task-specific models, remote sensing has been significantly enhanced by the advent of foundation models--large-scale, pre-trained AI models capable of performing a wide array of tasks with unprecedented accuracy and efficiency. This paper provides a comprehensive survey of foundation models in the remote sensing domain, covering models released between June 2021 and June 2024. We categorize these models based on their applications in computer vision and domain-specific tasks, offering insights into their architectures, pre-training datasets, and methodologies. Through detailed performance comparisons, we highlight emerging trends and the significant advancements achieved by these foundation models. Additionally, we discuss the technical challenges, practical implications, and future research directions, addressing the need for high-quality data, computational resources, and improved model generalization. Our research also finds that pre-training methods, particularly self-supervised learning techniques like contrastive learning and masked autoencoders, significantly enhance the performance and robustness of foundation models in remote sensing tasks such as scene classification, object detection, and other applications. This survey aims to serve as a resource for researchers and practitioners by providing a panorama of advances and promising pathways for continued development and application of foundation models in remote sensing.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal Classification
Authors:
Zhen Qin,
Junru Wu,
Jiaming Shen,
Tianqi Liu,
Xuanhui Wang
Abstract:
We introduce LAMPO, a novel paradigm that leverages Large Language Models (LLMs) for solving few-shot multi-class ordinal classification tasks. Unlike conventional methods, which concatenate all demonstration examples with the test instance and prompt LLMs to produce the pointwise prediction, our framework uses the LLM as a preference machine that makes a relative comparative decision between the…
▽ More
We introduce LAMPO, a novel paradigm that leverages Large Language Models (LLMs) for solving few-shot multi-class ordinal classification tasks. Unlike conventional methods, which concatenate all demonstration examples with the test instance and prompt LLMs to produce the pointwise prediction, our framework uses the LLM as a preference machine that makes a relative comparative decision between the test instance and each demonstration. A self-supervised method is then introduced to aggregate these binary comparisons into the final ordinal decision. LAMPO addresses several limitations inherent in previous methods, including context length constraints, ordering biases, and challenges associated with absolute point-wise estimation. Extensive experiments on seven public datasets demonstrate LAMPO's remarkably competitive performance across a diverse spectrum of applications (e.g., movie review analysis and hate speech detection). Notably, in certain applications, the improvement can be substantial, exceeding 20% in an absolute term. Moreover, we believe LAMPO represents an interesting addition to the non-parametric application layered on top of LLMs, as it supports black-box LLMs without necessitating the outputting of LLM's internal states (e.g., embeddings), as seen in previous approaches.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Two-color Ytterbium MOT in a compact dual-chamber setup
Authors:
Xin Wang,
Thilina Muthu-Arachchige,
Tangi Legrand,
Ludwig Müller,
Wolfgang Alt,
Sebastian Hofferberth,
Eduardo Uruñuela
Abstract:
We present an experimental scheme for producing ultracold Ytterbium atoms in a compact dual-chamber setup. A dispenser-loaded two-dimensional (2D) magneto-optical trap (MOT) using permanent magnets and operating on the broad $^1S_0\to {}^1P_1$ singlet transition delivers over $10^7$ atoms per second through a differential pumping stage into a three-dimensional (3D) MOT. The two-color 3D MOT uses t…
▽ More
We present an experimental scheme for producing ultracold Ytterbium atoms in a compact dual-chamber setup. A dispenser-loaded two-dimensional (2D) magneto-optical trap (MOT) using permanent magnets and operating on the broad $^1S_0\to {}^1P_1$ singlet transition delivers over $10^7$ atoms per second through a differential pumping stage into a three-dimensional (3D) MOT. The two-color 3D MOT uses the broad singlet transition to accumulate $\sim\!2\times 10^7$ atoms of $^{174}\text{Yb}$ within $2.5~\text{s}$ and subsequently the narrow $^1S_0\to {}^3P_1$ intercombination line to cool the atomic cloud to below $10~\mathrm{μK}$. We report optimized parameters for each stage of the atom collection sequence, achieving high transfer efficiency. We find that shelving into the triplet state during the broad-transition MOT almost doubles the number of trapped atoms.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Measurement of $Σ^+$ transverse polarization in $e^+e^-$ collisions at $\sqrt{s} = 3.68-3.71$ GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at seven energy points ranging from 3.68 to 3.71 GeV and corresponding to an integrated luminosity of $652.1~{\rm pb^{-1}}$, we present an energy-dependent measurement of the transverse polarization, relative phase and modulus ratio of the electromagnetic form factors of the $Σ^+$ hyperon in the $e^+e^- \to Σ^+ \barΣ^-$ reaction. The…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at seven energy points ranging from 3.68 to 3.71 GeV and corresponding to an integrated luminosity of $652.1~{\rm pb^{-1}}$, we present an energy-dependent measurement of the transverse polarization, relative phase and modulus ratio of the electromagnetic form factors of the $Σ^+$ hyperon in the $e^+e^- \to Σ^+ \barΣ^-$ reaction. These results are helpful to understand the production mechanism of the $Σ^+$-$\barΣ^-$ pairs.
△ Less
Submitted 7 August, 2024; v1 submitted 6 August, 2024;
originally announced August 2024.
-
High-dimensional quantum XYZ product codes for biased noise
Authors:
Zhipeng Liang,
Zhengzhong Yi,
Fusheng Yang,
Jiahan Chen,
Zicheng Wang,
Xuan Wang
Abstract:
Quantum XYZ product can construct a class of non-CSS codes by using three classical codes. However, before this work, their error-correcting performance is not studied in depth and whether this code construction method can be generalized to higher dimension is an open question. In this paper, we first study the error-correcting performance of the 3D Chamon code, which can be seen as a non-CSS vari…
▽ More
Quantum XYZ product can construct a class of non-CSS codes by using three classical codes. However, before this work, their error-correcting performance is not studied in depth and whether this code construction method can be generalized to higher dimension is an open question. In this paper, we first study the error-correcting performance of the 3D Chamon code, which can be seen as a non-CSS variant of the 3D toric code and a special instance of the XYZ product of three repetition codes. Second, we show that XYZ product can be generalized to four dimension and propose four-dimensional (4D) XYZ product code construction, which can be seen as a variant of 4D homological product and constructs a class of non-CSS codes by using 4 classical codes or 2 CSS codes. Compared with 4D homological product, we show that 4D XYZ product can construct non-CSS codes with higher dimension or code distance. Third, we consider two special instances of 4D XYZ product, which we name 4D Chamon code and 4D XYZ concatenated code. Exploiting fully decoupled binary belief propagation combined with ordered statistics decoding, our simulation results show that, using the same two CSS codes, 4D XYZ product can construct non-CSS codes with better error-correcting performance for $Z$-biased noise than CSS codes constructed by 4D homological product, which is more meaningful for practice quantum computing system.
△ Less
Submitted 2 September, 2024; v1 submitted 6 August, 2024;
originally announced August 2024.
-
Characterizing the current systems in the Martian ionosphere
Authors:
Jiawei Gao,
Shibang Li,
Anna Mittelholz,
Zhaojin Rong,
Moa Persson,
Zhen Shi,
Haoyu Lu,
Chi Zhang,
Xiaodong Wang,
Chuanfei Dong,
Lucy Klinger,
Jun Cui,
Yong Wei,
Yongxin Pan
Abstract:
When the solar wind interacts with the ionosphere of an unmagnetized planet, it induces currents that form an induced magnetosphere. These currents and their associated magnetic fields play a pivotal role in controlling the movement of charged particles, which is essential for understanding the escape of planetary ions. Unlike the well-documented magnetospheric current systems, the ionospheric cur…
▽ More
When the solar wind interacts with the ionosphere of an unmagnetized planet, it induces currents that form an induced magnetosphere. These currents and their associated magnetic fields play a pivotal role in controlling the movement of charged particles, which is essential for understanding the escape of planetary ions. Unlike the well-documented magnetospheric current systems, the ionospheric current systems on unmagnetized planets remain less understood, which constrains the quantification of electrodynamic energy transfer from stars to these planets. Here, utilizing eight years of data from the Mars Atmosphere and Volatile EvolutioN (MAVEN) mission, we investigate the global distribution of ionospheric currents on Mars. We have identified two distinct current systems in the ionosphere: one aligns with the solar wind electric field yet exhibits hemispheric asymmetry perpendicular to the electric field direction; the other corresponds to the flow pattern of annually-averaged neutral winds. We propose that these two current systems are driven by the solar wind and atmospheric neutral winds, respectively. Our findings reveal that Martian ionospheric dynamics are influenced by the neutral winds from below and the solar wind from above, highlighting the complex and intriguing nature of current systems on unmagnetized planets.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Valence Quark Distributions in Pions: Insights from Tsallis Entropy
Authors:
Jingxuan Chen,
Xiaopeng Wang,
Yanbing Cai,
Xurong Chen,
Qian Wang
Abstract:
We investigate the valence quark distributions of pions at a low initial scale ($Q^2_0$) through the application of Tsallis entropy, a non-extensive measure adept at encapsulating long-range correlations among internal constituents. Utilizing the maximum entropy approach, we derive the valence quark distributions at elevated resolution scales via a modified DGLAP equation, which integrates GLR-MQ-…
▽ More
We investigate the valence quark distributions of pions at a low initial scale ($Q^2_0$) through the application of Tsallis entropy, a non-extensive measure adept at encapsulating long-range correlations among internal constituents. Utilizing the maximum entropy approach, we derive the valence quark distributions at elevated resolution scales via a modified DGLAP equation, which integrates GLR-MQ-ZRS corrections for the $Q^2$ evolution. Our findings indicate that the resulting $Q^2$-dependent valence quark distributions yield an optimal fit to experimental data, with an inferred parameter value of $q$ ($q = 0.91$), diverging from unity. This deviation highlights the significant role that correlations among valence quarks play in shaping our understanding of pion internal structure. Additionally, our computations of the first three moments of pion quark distributions at $ Q^2 = 4 \, \mathrm{GeV}^2$ display consistency with alternative theoretical models, thereby reinforcing the importance of incorporating valence quark correlations within this analytical framework.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
MGFs: Masked Gaussian Fields for Meshing Building based on Multi-View Images
Authors:
Tengfei Wang,
Zongqian Zhan,
Rui Xia,
Linxia Ji,
Xin Wang
Abstract:
Over the last few decades, image-based building surface reconstruction has garnered substantial research interest and has been applied across various fields, such as heritage preservation, architectural planning, etc. Compared to the traditional photogrammetric and NeRF-based solutions, recently, Gaussian fields-based methods have exhibited significant potential in generating surface meshes due to…
▽ More
Over the last few decades, image-based building surface reconstruction has garnered substantial research interest and has been applied across various fields, such as heritage preservation, architectural planning, etc. Compared to the traditional photogrammetric and NeRF-based solutions, recently, Gaussian fields-based methods have exhibited significant potential in generating surface meshes due to their time-efficient training and detailed 3D information preservation. However, most gaussian fields-based methods are trained with all image pixels, encompassing building and nonbuilding areas, which results in a significant noise for building meshes and degeneration in time efficiency. This paper proposes a novel framework, Masked Gaussian Fields (MGFs), designed to generate accurate surface reconstruction for building in a time-efficient way. The framework first applies EfficientSAM and COLMAP to generate multi-level masks of building and the corresponding masked point clouds. Subsequently, the masked gaussian fields are trained by integrating two innovative losses: a multi-level perceptual masked loss focused on constructing building regions and a boundary loss aimed at enhancing the details of the boundaries between different masks. Finally, we improve the tetrahedral surface mesh extraction method based on the masked gaussian spheres. Comprehensive experiments on UAV images demonstrate that, compared to the traditional method and several NeRF-based and Gaussian-based SOTA solutions, our approach significantly improves both the accuracy and efficiency of building surface reconstruction. Notably, as a byproduct, there is an additional gain in the novel view synthesis of building.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Observation of $η_{c}(2S) \to K^{+}K^{-}η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
By analyzing $(27.12 \pm 0.14)\times10^{8}$ $ψ(3686)$ events accumulated with the BESIII detector, the decay $η_{c}(2S) \to K^{+} K^{-} η$ is observed for the first time with a significance of $6.2σ$ after considering systematic uncertainties. The product of the branching fractions of $ψ(3686) \to γη_{c}(2S)$ and $η_{c}(2S) \to K^{+} K^{-} η$ is measured to be…
▽ More
By analyzing $(27.12 \pm 0.14)\times10^{8}$ $ψ(3686)$ events accumulated with the BESIII detector, the decay $η_{c}(2S) \to K^{+} K^{-} η$ is observed for the first time with a significance of $6.2σ$ after considering systematic uncertainties. The product of the branching fractions of $ψ(3686) \to γη_{c}(2S)$ and $η_{c}(2S) \to K^{+} K^{-} η$ is measured to be $\mathcal{B}(ψ(3686) \toγη_{c}(2S))\times \mathcal{B}(η_{c}(2S)\to K^{+} K^{-}η)=(2.39 \pm 0.32 \pm 0.34) \times 10^{-6}$, where the first uncertainty is statistical, and the second one is systematic. The branching fraction of $η_{c}(2S)\to K^{+} K^{-}η$ is determined to be $\mathcal{B}(η_{c}(2S)\to K^{+} K^{-}η) = (3.42 \pm 0.46 \pm 0.48 \pm 2.44) \times 10^{-3}$, where the third uncertainty is due to the branching fraction of $ψ(3686) \to γη_{c}(2S)$. Using a recent BESIII measurement of $\mathcal{B} (η_{c}(2S) \to K^{+} K^{-}π^{0})$, we also determine the ratio between the branching fractions of $η_{c}(2S) \to K^{+} K^{-}η$ and $η_{c}(2S) \to K^{+} K^{-}π^{0}$ to be $1.49 \pm 0.22 \pm 0.25$, which is consistent with the previous result of BaBar at a comparable precision level.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection
Authors:
Sen Nie,
Zhuo Wang,
Xinxin Wang,
Kun He
Abstract:
Recent studies emphasize the crucial role of data augmentation in enhancing the performance of object detection models. However,existing methodologies often struggle to effectively harmonize dataset diversity with semantic coordination.To bridge this gap, we introduce an innovative augmentation technique leveraging pre-trained conditional diffusion models to mediate this balance. Our approach enco…
▽ More
Recent studies emphasize the crucial role of data augmentation in enhancing the performance of object detection models. However,existing methodologies often struggle to effectively harmonize dataset diversity with semantic coordination.To bridge this gap, we introduce an innovative augmentation technique leveraging pre-trained conditional diffusion models to mediate this balance. Our approach encompasses the development of a Category Affinity Matrix, meticulously designed to enhance dataset diversity, and a Surrounding Region Alignment strategy, which ensures the preservation of semantic coordination in the augmented images. Extensive experimental evaluations confirm the efficacy of our method in enriching dataset diversity while seamlessly maintaining semantic coordination. Our method yields substantial average improvements of +1.4AP, +0.9AP, and +3.4AP over existing alternatives on three distinct object detection models, respectively.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Gaussian Mixture based Evidential Learning for Stereo Matching
Authors:
Weide Liu,
Xingxing Wang,
Lu Wang,
Jun Cheng,
Fayao Liu,
Xulei Yang
Abstract:
In this paper, we introduce a novel Gaussian mixture based evidential learning solution for robust stereo matching. Diverging from previous evidential deep learning approaches that rely on a single Gaussian distribution, our framework posits that individual image data adheres to a mixture-of-Gaussian distribution in stereo matching. This assumption yields more precise pixel-level predictions and m…
▽ More
In this paper, we introduce a novel Gaussian mixture based evidential learning solution for robust stereo matching. Diverging from previous evidential deep learning approaches that rely on a single Gaussian distribution, our framework posits that individual image data adheres to a mixture-of-Gaussian distribution in stereo matching. This assumption yields more precise pixel-level predictions and more accurately mirrors the real-world image distribution. By further employing the inverse-Gamma distribution as an intermediary prior for each mixture component, our probabilistic model achieves improved depth estimation compared to its counterpart with the single Gaussian and effectively captures the model uncertainty, which enables a strong cross-domain generation ability. We evaluated our method for stereo matching by training the model using the Scene Flow dataset and testing it on KITTI 2015 and Middlebury 2014. The experiment results consistently show that our method brings improvements over the baseline methods in a trustworthy manner. Notably, our approach achieved new state-of-the-art results on both the in-domain validated data and the cross-domain datasets, demonstrating its effectiveness and robustness in stereo matching tasks.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Symmetric Graph Contrastive Learning against Noisy Views for Recommendation
Authors:
Chu Zhao,
Enneng Yang,
Yuliang Liang,
Jianzhe Zhao,
Guibing Guo,
Xingwei Wang
Abstract:
Graph Contrastive Learning (GCL) leverages data augmentation techniques to produce contrasting views, enhancing the accuracy of recommendation systems through learning the consistency between contrastive views. However, existing augmentation methods, such as directly perturbing interaction graph (e.g., node/edge dropout), may interfere with the original connections and generate poor contrasting vi…
▽ More
Graph Contrastive Learning (GCL) leverages data augmentation techniques to produce contrasting views, enhancing the accuracy of recommendation systems through learning the consistency between contrastive views. However, existing augmentation methods, such as directly perturbing interaction graph (e.g., node/edge dropout), may interfere with the original connections and generate poor contrasting views, resulting in sub-optimal performance. In this paper, we define the views that share only a small amount of information with the original graph due to poor data augmentation as noisy views (i.e., the last 20% of the views with a cosine similarity value less than 0.1 to the original view). We demonstrate through detailed experiments that noisy views will significantly degrade recommendation performance. Further, we propose a model-agnostic Symmetric Graph Contrastive Learning (SGCL) method with theoretical guarantees to address this issue. Specifically, we introduce symmetry theory into graph contrastive learning, based on which we propose a symmetric form and contrast loss resistant to noisy interference. We provide theoretical proof that our proposed SGCL method has a high tolerance to noisy views. Further demonstration is given by conducting extensive experiments on three real-world datasets. The experimental results demonstrate that our approach substantially increases recommendation accuracy, with relative improvements reaching as high as 12.25% over nine other competing models. These results highlight the efficacy of our method.
△ Less
Submitted 3 August, 2024;
originally announced August 2024.
-
SDSS-IV MaNGA: Stellar rotational support in disk galaxies vs. central surface density and stellar population age
Authors:
Xiaohan Wang,
Yifei Luo,
S. M. Faber,
David C. Koo,
Shude Mao,
Kyle B. Westfall,
Shengdong Lu,
Weichen Wang,
Kevin Bundy,
N. Boardman,
Vladimir Avila-Reese,
José G. Fernández-Trincado,
Richard R. Lane
Abstract:
We investigate how the stellar rotational support changes as a function of spatially resolved stellar population age ($\rm D_n4000$) and relative central stellar surface density ($ΔΣ_1$) for MaNGA isolated/central disk galaxies. We find that the galaxy rotational support $λ_{R_\mathrm{e}}$ varies smoothly as a function of $ΔΣ_1$ and $\rm D_n4000$. $\rm D_n4000$ vs. $ΔΣ_1$ follows a "J-shape", with…
▽ More
We investigate how the stellar rotational support changes as a function of spatially resolved stellar population age ($\rm D_n4000$) and relative central stellar surface density ($ΔΣ_1$) for MaNGA isolated/central disk galaxies. We find that the galaxy rotational support $λ_{R_\mathrm{e}}$ varies smoothly as a function of $ΔΣ_1$ and $\rm D_n4000$. $\rm D_n4000$ vs. $ΔΣ_1$ follows a "J-shape", with $λ_{R_\mathrm{e}}$ contributing to the scatters. In this "J-shaped" pattern rotational support increases with central $\rm D_n4000$ when $ΔΣ_1$ is low but decreases with $ΔΣ_1$ when $ΔΣ_1$ is high. Restricting attention to low-$ΔΣ_1$ (i.e, large-radius) galaxies, we suggest that the trend of increasing rotational support with $\rm D_n4000$ for these objects is produced by a mix of two different processes, a primary trend characterized by growth in $λ_{R_\mathrm{e}}$ along with mass through gas accretion, on top of which disturbance episodes are overlaid, which reduce rotational support and trigger increased star formation. An additional finding is that star forming galaxies with low $ΔΣ_1$ have relatively larger radii than galaxies with higher $ΔΣ_1$ at fixed stellar mass. Assuming that these relative radii rankings are preserved while galaxies are star forming then implies clear evolutionary paths in central $\rm D_n4000$ vs. $ΔΣ_1$. The paper closes with comments on the implications that these paths have for the evolution of pseudo-bulges vs. classical-bulges. The utility of using $\rm D_n4000$-$ΔΣ_1$ to study $λ_{R_\mathrm{e}}$ reinforces the notion that galaxy kinematics correlate both with structure and with stellar-population state, and indicates the importance of a multi-dimensional description for understanding bulge and galaxy evolution.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Interactive 3D Medical Image Segmentation with SAM 2
Authors:
Chuyun Shen,
Wenhao Li,
Yuhang Shi,
Xiangfeng Wang
Abstract:
Interactive medical image segmentation (IMIS) has shown significant potential in enhancing segmentation accuracy by integrating iterative feedback from medical professionals. However, the limited availability of enough 3D medical data restricts the generalization and robustness of most IMIS methods. The Segment Anything Model (SAM), though effective for 2D images, requires expensive semi-auto slic…
▽ More
Interactive medical image segmentation (IMIS) has shown significant potential in enhancing segmentation accuracy by integrating iterative feedback from medical professionals. However, the limited availability of enough 3D medical data restricts the generalization and robustness of most IMIS methods. The Segment Anything Model (SAM), though effective for 2D images, requires expensive semi-auto slice-by-slice annotations for 3D medical images. In this paper, we explore the zero-shot capabilities of SAM 2, the next-generation Meta SAM model trained on videos, for 3D medical image segmentation. By treating sequential 2D slices of 3D images as video frames, SAM 2 can fully automatically propagate annotations from a single frame to the entire 3D volume. We propose a practical pipeline for using SAM 2 in 3D medical image segmentation and present key findings highlighting its efficiency and potential for further optimization. Concretely, numerical experiments on the BraTS2020 and the medical segmentation decathlon datasets demonstrate that SAM 2 still has a gap with supervised methods but can narrow the gap in specific settings and organ types, significantly reducing the annotation burden on medical professionals. Our code will be open-sourced and available at https://github.com/Chuyun-Shen/SAM_2_Medical_3D.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Bidirectional classical communication cost of a bipartite quantum channel assisted by non-signalling correlations
Authors:
Chengkai Zhu,
Xuanqiang Zhao,
Xin Wang
Abstract:
Understanding the classical communication cost of simulating a quantum channel is a fundamental problem in quantum information theory, which becomes even more intriguing when considering the role of non-locality in quantum information processing. This paper investigates the bidirectional classical communication cost of simulating a bipartite quantum channel assisted by non-signalling correlations.…
▽ More
Understanding the classical communication cost of simulating a quantum channel is a fundamental problem in quantum information theory, which becomes even more intriguing when considering the role of non-locality in quantum information processing. This paper investigates the bidirectional classical communication cost of simulating a bipartite quantum channel assisted by non-signalling correlations. Such non-signalling correlations are permitted not only across spatial dimension between the two parties but also along the temporal dimension of the channel simulation protocol. By introducing non-signalling superchannels, we derive semidefinite programming (SDP) formulations for the one-shot exact bidirectional classical communication cost via non-signalling bipartite superchannels. We further introduce a channel's bipartite conditional min-entropy as an efficiently computable lower bound on the asymptotic cost of bidirectional classical communication. Our results in both one-shot and asymptotic settings provide lower bounds on the entanglement-assisted simulation cost in scenarios where entanglement is available to the two parties and can be utilized across the timeline of the protocol. Numerical experiments demonstrate the effectiveness of our bounds in estimating communication costs for various quantum channels, showing that our bounds can be tight in different scenarios. Our results elucidate the role of non-locality in quantum communication and pave the way for exploring quantum reverse Shannon theory in bipartite scenarios.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Moire exciton polaritons in twisted photonic lattices at room temperature
Authors:
Chunzi Xing,
Yu Wang,
Tobias Schneider,
Xiaokun Zhai,
Xinzheng Zhang,
Zhenyu Xiong,
Hao Wu,
Yuan Ren,
Haitao Dai,
Xiao Wang,
Anlian Pan,
Stefan Schumacher,
Xuekai Ma,
Tingge Gao
Abstract:
Moire lattices attract intensive attention in the double graphene/TMD layers and photonic crystals due to the interesting exotic physics within these structures. However, precise measurement of the moir'e ground states, excited states and Bloch bands in the twisted photonic lattices is still illusive. In this work we report the strong coupling between the excitons of CsPbBr3 microplates and the ph…
▽ More
Moire lattices attract intensive attention in the double graphene/TMD layers and photonic crystals due to the interesting exotic physics within these structures. However, precise measurement of the moir'e ground states, excited states and Bloch bands in the twisted photonic lattices is still illusive. In this work we report the strong coupling between the excitons of CsPbBr3 microplates and the photonic modes of the moire lattice at room temperature. Depending on the coupling strength between the nearest potential sites, we observe staggered moire polariton ground states, excited states trapped in the potential sites and moire polariton bands across the twisted photonic lattice. In addition, the phase locking of moire zero (stable in-phase) states and moire pi (metastable antiphase) states with different spatial distributions are measured. Moir'e polariton distribution can be tuned in the shape of parallelogram by controlling the depth and width of the potential in one photonic lattice with another one fixed. Our work lays the foundation to study moir'e exciton polariton Wigner crystals and Luttinger liquid in twisted photonic lattices at room temperature.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models
Authors:
Tongtong Feng,
Qing Li,
Xin Wang,
Mingzi Wang,
Guangyao Li,
Wenwu Zhu
Abstract:
Cross-view geo-localization in GNSS-denied environments aims to determine an unknown location by matching drone-view images with the correct geo-tagged satellite-view images from a large gallery. Recent research shows that learning discriminative image representations under specific weather conditions can significantly enhance performance. However, the frequent occurrence of unseen extreme weather…
▽ More
Cross-view geo-localization in GNSS-denied environments aims to determine an unknown location by matching drone-view images with the correct geo-tagged satellite-view images from a large gallery. Recent research shows that learning discriminative image representations under specific weather conditions can significantly enhance performance. However, the frequent occurrence of unseen extreme weather conditions hinders progress. This paper introduces MCGF, a Multi-weather Cross-view Geo-localization Framework designed to dynamically adapt to unseen weather conditions. MCGF establishes a joint optimization between image restoration and geo-localization using denoising diffusion models. For image restoration, MCGF incorporates a shared encoder and a lightweight restoration module to help the backbone eliminate weather-specific information. For geo-localization, MCGF uses EVA-02 as a backbone for feature extraction, with cross-entropy loss for training and cosine distance for testing. Extensive experiments on University160k-WX demonstrate that MCGF achieves competitive results for geo-localization in varying weather conditions.
△ Less
Submitted 27 August, 2024; v1 submitted 5 August, 2024;
originally announced August 2024.
-
Enhanced Equilibria-Solving via Private Information Pre-Branch Structure in Adversarial Team Games
Authors:
Chen Qiu,
Haobo Fu,
Kai Li,
Weixin Huang,
Jiajia Zhang,
Xuan Wang
Abstract:
In ex ante coordinated adversarial team games (ATGs), a team competes against an adversary, and the team members are only allowed to coordinate their strategies before the game starts. The team-maxmin equilibrium with correlation (TMECor) is a suitable solution concept for ATGs. One class of TMECor-solving methods transforms the problem into solving NE in two-player zero-sum games, leveraging well…
▽ More
In ex ante coordinated adversarial team games (ATGs), a team competes against an adversary, and the team members are only allowed to coordinate their strategies before the game starts. The team-maxmin equilibrium with correlation (TMECor) is a suitable solution concept for ATGs. One class of TMECor-solving methods transforms the problem into solving NE in two-player zero-sum games, leveraging well-established tools for the latter. However, existing methods are fundamentally action-based, resulting in poor generalizability and low solving efficiency due to the exponential growth in the size of the transformed game. To address the above issues, we propose an efficient game transformation method based on private information, where all team members are represented by a single coordinator. We designed a structure called private information pre-branch, which makes decisions considering all possible private information from teammates. We prove that the size of the game transformed by our method is exponentially reduced compared to the current state-of-the-art. Moreover, we demonstrate equilibria equivalence. Experimentally, our method achieves a significant speedup of 182.89$\times$ to 694.44$\times$ in scenarios where the current state-of-the-art method can work, such as small-scale Kuhn poker and Leduc poker. Furthermore, our method is applicable to larger games and those with dynamically changing private information, such as Goofspiel.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Source-Free Domain-Invariant Performance Prediction
Authors:
Ekaterina Khramtsova,
Mahsa Baktashmotlagh,
Guido Zuccon,
Xi Wang,
Mathieu Salzmann
Abstract:
Accurately estimating model performance poses a significant challenge, particularly in scenarios where the source and target domains follow different data distributions. Most existing performance prediction methods heavily rely on the source data in their estimation process, limiting their applicability in a more realistic setting where only the trained model is accessible. The few methods that do…
▽ More
Accurately estimating model performance poses a significant challenge, particularly in scenarios where the source and target domains follow different data distributions. Most existing performance prediction methods heavily rely on the source data in their estimation process, limiting their applicability in a more realistic setting where only the trained model is accessible. The few methods that do not require source data exhibit considerably inferior performance. In this work, we propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data. We establish connections between our approach for unsupervised calibration and temperature scaling. We then employ a gradient-based strategy to evaluate the correctness of the calibrated predictions. Our experiments on benchmark object recognition datasets reveal that existing source-based methods fall short with limited source sample availability. Furthermore, our approach significantly outperforms the current state-of-the-art source-free and source-based methods, affirming its effectiveness in domain-invariant performance estimation.
△ Less
Submitted 6 August, 2024; v1 submitted 4 August, 2024;
originally announced August 2024.
-
Noise Suppression for CRP Gathers Based on Self2Self with Dropout
Authors:
Fei Li,
Zhenbin Xia,
Dawei Liu,
Xiaokai Wang,
Wenchao Chen,
Juan Chen,
Leiming Xu
Abstract:
Noise suppression in seismic data processing is a crucial research focus for enhancing subsequent imaging and reservoir prediction. Deep learning has shown promise in computer vision and holds significant potential for seismic data processing. However, supervised learning, which relies on clean labels to train network prediction models, faces challenges due to the unavailability of clean labels fo…
▽ More
Noise suppression in seismic data processing is a crucial research focus for enhancing subsequent imaging and reservoir prediction. Deep learning has shown promise in computer vision and holds significant potential for seismic data processing. However, supervised learning, which relies on clean labels to train network prediction models, faces challenges due to the unavailability of clean labels for seismic exploration data. In contrast, self-supervised learning substitutes traditional supervised learning with surrogate tasks by different auxiliary means, exploiting internal input data information. Inspired by Self2Self with Dropout, this paper presents a self-supervised learning-based noise suppression method called Self-Supervised Deep Convolutional Networks (SSDCN), specifically designed for Common Reflection Point (CRP) gathers. We utilize pairs of Bernoulli-sampled instances of the input noisy image as surrogate tasks to leverage its inherent structure. Furthermore, SSDCN incorporates geological knowledge through the normal moveout correction technique, which capitalizes on the approximately horizontal behavior and strong self-similarity observed in useful signal events within CRP gathers. By exploiting the discrepancy in self-similarity between the useful signals and noise in CRP gathers, SSDCN effectively extracts self-similarity features during training iterations, prioritizing the extraction of useful signals to achieve noise suppression. Experimental results on synthetic and actual CRP gathers demonstrate that SSDCN achieves high-fidelity noise suppression.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
Authors:
Zhichao Wang,
Yuanzhe Chen,
Xinsheng Wang,
Lei Xie,
Yuping Wang
Abstract:
StreamVoice has recently pushed the boundaries of zero-shot voice conversion (VC) in the streaming domain. It uses a streamable language model (LM) with a context-aware approach to convert semantic features from automatic speech recognition (ASR) into acoustic features with the desired speaker timbre. Despite its innovations, StreamVoice faces challenges due to its dependency on a streaming ASR wi…
▽ More
StreamVoice has recently pushed the boundaries of zero-shot voice conversion (VC) in the streaming domain. It uses a streamable language model (LM) with a context-aware approach to convert semantic features from automatic speech recognition (ASR) into acoustic features with the desired speaker timbre. Despite its innovations, StreamVoice faces challenges due to its dependency on a streaming ASR within a cascaded framework, which complicates system deployment and optimization, affects VC system's design and performance based on the choice of ASR, and struggles with conversion stability when faced with low-quality semantic inputs. To overcome these limitations, we introduce StreamVoice+, an enhanced LM-based end-to-end streaming framework that operates independently of streaming ASR. StreamVoice+ integrates a semantic encoder and a connector with the original StreamVoice framework, now trained using a non-streaming ASR. This model undergoes a two-stage training process: initially, the StreamVoice backbone is pre-trained for voice conversion and the semantic encoder for robust semantic extraction. Subsequently, the system is fine-tuned end-to-end, incorporating a LoRA matrix to activate comprehensive streaming functionality. Furthermore, StreamVoice+ mainly introduces two strategic enhancements to boost conversion quality: a residual compensation mechanism in the connector to ensure effective semantic transmission and a self-refinement strategy that leverages pseudo-parallel speech pairs generated by the conversion backbone to improve speech decoupling. Experiments demonstrate that StreamVoice+ not only achieves higher naturalness and speaker similarity in voice conversion than its predecessor but also provides versatile support for both streaming and non-streaming conversion scenarios.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process
Authors:
Peng Wang,
Xiaobin Wang,
Chao Lou,
Shengyu Mao,
Pengjun Xie,
Yong Jiang
Abstract:
In-context learning (ICL) is a few-shot learning paradigm that involves learning mappings through input-output pairs and appropriately applying them to new instances. Despite the remarkable ICL capabilities demonstrated by Large Language Models (LLMs), existing works are highly dependent on large-scale labeled support sets, not always feasible in practical scenarios. To refine this approach, we fo…
▽ More
In-context learning (ICL) is a few-shot learning paradigm that involves learning mappings through input-output pairs and appropriately applying them to new instances. Despite the remarkable ICL capabilities demonstrated by Large Language Models (LLMs), existing works are highly dependent on large-scale labeled support sets, not always feasible in practical scenarios. To refine this approach, we focus primarily on an innovative selective annotation mechanism, which precedes the standard demonstration retrieval. We introduce the Language Model-based Determinant Point Process (LM-DPP) that simultaneously considers the uncertainty and diversity of unlabeled instances for optimal selection. Consequently, this yields a subset for annotation that strikes a trade-off between the two factors. We apply LM-DPP to various language models, including GPT-J, LlaMA, and GPT-3. Experimental results on 9 NLU and 2 Generation datasets demonstrate that LM-DPP can effectively select canonical examples. Further analysis reveals that LLMs benefit most significantly from subsets that are both low uncertainty and high diversity.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
Evolutionary dynamics in stochastic nonlinear public goods games
Authors:
Wenqiang Zhu,
Xin Wang,
Chaoqian Wang,
Longzhao Liu,
Jiaxin Hu,
Zhiming Zheng,
Shaoting Tang,
Hongwei Zheng,
Jin Dong
Abstract:
Understanding the evolution of cooperation in multiplayer games is of vital significance for natural and social systems. An important challenge is that group interactions often leads to nonlinear synergistic effects. However, previous models mainly focus on deterministic nonlinearity where the arise of synergy or discounting effect is determined by certain conditions, ignoring uncertainty and stoc…
▽ More
Understanding the evolution of cooperation in multiplayer games is of vital significance for natural and social systems. An important challenge is that group interactions often leads to nonlinear synergistic effects. However, previous models mainly focus on deterministic nonlinearity where the arise of synergy or discounting effect is determined by certain conditions, ignoring uncertainty and stochasticity in real-world systems. Here, we develop a probabilistic framework to study the cooperative behavior in stochastic nonlinear public goods games. Through both analytical treatment and Monte Carlo simulations, we provide comprehensive understanding of social dilemmas with stochastic nonlinearity in both well-mixed and structured populations. We find that increasing the degree of nonlinearity makes synergy more advantageous when competing with discounting, thereby promoting cooperation. Interestingly, we show that network reciprocity loses effectiveness when the probability of synergy is small. Moreover, group size exhibits nonlinear effects on group cooperation regardless of the underlying structure. Our findings thus provide novel insights into how stochastic nonlinearity influences the emergence of prosocial behavior.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
Switchable anomalous Hall effect by selective mirror symmetry breaking in a kagome magnet GdMn6Ge6
Authors:
Zicheng Tao,
Tianye Yu,
Jianyang Ding,
Zhicheng Jiang,
Zhenhai Yu,
Wei Xia,
Xia Wang,
Xuerong Liu,
Yulin Chen,
Dawei Shen,
Yan Sun,
Yanfeng Guo
Abstract:
The crystal symmetry plays a pivotal role in protecting the nontrivial electronic states in a topological phase. Manipulation of the crystal symmetry and hence the nontrivial topological states would serve as a fertile ground to explore exotic topological properties. Combining experimental and theoretical investigations, we demonstrate herein the flexible switch of nontrivial topological states in…
▽ More
The crystal symmetry plays a pivotal role in protecting the nontrivial electronic states in a topological phase. Manipulation of the crystal symmetry and hence the nontrivial topological states would serve as a fertile ground to explore exotic topological properties. Combining experimental and theoretical investigations, we demonstrate herein the flexible switch of nontrivial topological states in the single phase of kagome magnet GdMn6Ge6. The intrinsic anomalous Hall effect caused by distinct Berry curvatures along different crystallographic directions is realized through selectively breaking the mirror symmetries in these directions by external magnetic field, which is fully supported by the first-principles calculations. Our results set an explicit example demonstrating the strong correlation between structure symmetry and nontrivial topological states, as well as the switchable topological properties in a single magnetic topological phase.
△ Less
Submitted 6 August, 2024; v1 submitted 4 August, 2024;
originally announced August 2024.
-
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning
Authors:
Xin Wang,
Kai Chen,
Xingjun Ma,
Zhineng Chen,
Jingjing Chen,
Yu-Gang Jiang
Abstract:
Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks even under a black-box setting where the adversary can only query the model. Particularly, query-based black-box adversarial attacks estimate adversarial gradients based on the returned probability vectors of the target model for a sequence of queries. During this process, the queries made to the target model are interme…
▽ More
Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks even under a black-box setting where the adversary can only query the model. Particularly, query-based black-box adversarial attacks estimate adversarial gradients based on the returned probability vectors of the target model for a sequence of queries. During this process, the queries made to the target model are intermediate adversarial examples crafted at the previous attack step, which share high similarities in the pixel space. Motivated by this observation, stateful detection methods have been proposed to detect and reject query-based attacks. While demonstrating promising results, these methods either have been evaded by more advanced attacks or suffer from low efficiency in terms of the number of shots (queries) required to detect different attacks. Arguably, the key challenge here is to assign high similarity scores for any two intermediate adversarial examples perturbed from the same clean image. To address this challenge, we propose a novel Adversarial Contrastive Prompt Tuning (ACPT) method to robustly fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries. With ACPT, we further introduce a detection framework AdvQDet that can detect 7 state-of-the-art query-based attacks with $>99\%$ detection rate within 5 shots. We also show that ACPT is robust to 3 types of adaptive attacks. Code is available at https://github.com/xinwong/AdvQDet.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
Restriction of Schrödinger eigenfunctions to submanifolds
Authors:
Xiaoqi Huang,
Xing Wang,
Cheng Zhang
Abstract:
Burq-Gérard-Tzvetkov and Hu established $L^p$ estimates for the restriction of Laplace-Beltrami eigenfunctions to submanifolds. We investigate the eigenfunctions of the Schrödinger operators with critically singular potentials, and estimate the $L^p$ norms and period integrals for their restriction to submanifolds. Recently, Blair-Sire-Sogge obtained global $L^p$ bounds for Schrödinger eigenfuncti…
▽ More
Burq-Gérard-Tzvetkov and Hu established $L^p$ estimates for the restriction of Laplace-Beltrami eigenfunctions to submanifolds. We investigate the eigenfunctions of the Schrödinger operators with critically singular potentials, and estimate the $L^p$ norms and period integrals for their restriction to submanifolds. Recently, Blair-Sire-Sogge obtained global $L^p$ bounds for Schrödinger eigenfunctions by the resolvent method. Due to the Sobolev trace inequalities, the resolvent method cannot work for submanifolds of all dimensions. We get around this difficulty and establish spectral projection bounds by the wave kernel techniques and the bootstrap argument involving an induction on the dimensions of the submanifolds.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
Inflight Performance and Calibrations of the Lyman-alpha Solar Telescope on board the Advanced Space-based Solar Observatory
Authors:
Bo Chen,
Li Feng,
Guang Zhang,
Hui Li,
Lingping He,
Kefei Song,
Quanfeng Guo,
Ying Li,
Yu Huang,
Jingwei Li,
Jie Zhao,
Jianchao Xue,
Gen Li,
Guanglu Shi,
Dechao Song,
Lei Lu,
Beili Ying,
Haifeng Wang,
Shuang Dai,
Xiaodong Wang,
Shilei Mao,
Peng Wang,
Kun Wu,
Shuai Ren,
Liang Sun
, et al. (18 additional authors not shown)
Abstract:
The Lyman-alpha Solar Telescope (LST) on board the Advanced Space-based Solar Observatory (ASO-S) is the first payload to image the full solar disk and the solar corona in both white-light (WL) and ultraviolet (UV) H I Lya, extending up to 2.5 solar radii (Rs). Since the launch of the ASO-S on 9 October 2022, LST has captured various significant solar activities including flares, prominences, coro…
▽ More
The Lyman-alpha Solar Telescope (LST) on board the Advanced Space-based Solar Observatory (ASO-S) is the first payload to image the full solar disk and the solar corona in both white-light (WL) and ultraviolet (UV) H I Lya, extending up to 2.5 solar radii (Rs). Since the launch of the ASO-S on 9 October 2022, LST has captured various significant solar activities including flares, prominences, coronal mass ejections (CMEs). LST covers different passbands of 121.6 nm, 360 nm and 700 nm. The Lya Solar Disk Imager (SDI) has a field of view (FOV) of 38.4 arcmin and a spatial resolution of around 9.5 arcsec, while the White-Light Solar Telescope (WST) has a FOV of 38.43 arcmin and a spatial resolution of around 3.0 arcsec. The FOV of the Lya Solar Corona Imager (SCI) reaches 81.1 arcmin and its spatial resolution is 4.3 arcsec. The stray-light level in the 700 nm waveband is about 7.8e-6 MSB (mean solar brightness) at 1.1 Rs and 7.6e-7 MSB at 2.5 Rs, and in the Lya waveband it is around 4.3e-3 MSB at 1.1 Rs and 4.1e-4 MSB at 2.5 Rs. This article will detail the results from on-orbit tests and calibrations.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
Invariant Graph Learning Meets Information Bottleneck for Out-of-Distribution Generalization
Authors:
Wenyu Mao,
Jiancan Wu,
Haoyang Liu,
Yongduo Sui,
Xiang Wang
Abstract:
Graph out-of-distribution (OOD) generalization remains a major challenge in graph learning since graph neural networks (GNNs) often suffer from severe performance degradation under distribution shifts. Invariant learning, aiming to extract invariant features across varied distributions, has recently emerged as a promising approach for OOD generation. Despite the great success of invariant learning…
▽ More
Graph out-of-distribution (OOD) generalization remains a major challenge in graph learning since graph neural networks (GNNs) often suffer from severe performance degradation under distribution shifts. Invariant learning, aiming to extract invariant features across varied distributions, has recently emerged as a promising approach for OOD generation. Despite the great success of invariant learning in OOD problems for Euclidean data (i.e., images), the exploration within graph data remains constrained by the complex nature of graphs. Existing studies, such as data augmentation or causal intervention, either suffer from disruptions to invariance during the graph manipulation process or face reliability issues due to a lack of supervised signals for causal parts. In this work, we propose a novel framework, called Invariant Graph Learning based on Information bottleneck theory (InfoIGL), to extract the invariant features of graphs and enhance models' generalization ability to unseen distributions. Specifically, InfoIGL introduces a redundancy filter to compress task-irrelevant information related to environmental factors. Cooperating with our designed multi-level contrastive learning, we maximize the mutual information among graphs of the same class in the downstream classification tasks, preserving invariant features for prediction to a great extent. An appealing feature of InfoIGL is its strong generalization ability without depending on supervised signal of invariance. Experiments on both synthetic and real-world datasets demonstrate that our method achieves state-of-the-art performance under OOD generalization for graph classification tasks. The source code is available at https://github.com/maowenyu-11/InfoIGL.
△ Less
Submitted 3 August, 2024;
originally announced August 2024.
-
Jet-Induced Enhancement of Deuteron Production in $pp$ and $p$-Pb Collisions at the LHC
Authors:
Yi-Heng Feng,
Che Ming Ko,
Yu-Gang Ma,
Kai-Jia Sun,
Xin-Nian Wang,
Zhong Yang,
Song Zhang
Abstract:
Jet-associated deuteron production in $pp$ collisions at $\sqrt{s}=13$ TeV and $p$-Pb collisions at $\sqrt{s_{NN}}=5.02$ TeV is studied in the coalescence model by using the phase-space information of proton and neutron pairs from a multiphase transport (AMPT) model at the kinetic freezeout. In the low transverse momentum ($p_T$) region $p_T/A < 1.5$ GeV/$c$, where $A$ is the mass number of a nucl…
▽ More
Jet-associated deuteron production in $pp$ collisions at $\sqrt{s}=13$ TeV and $p$-Pb collisions at $\sqrt{s_{NN}}=5.02$ TeV is studied in the coalescence model by using the phase-space information of proton and neutron pairs from a multiphase transport (AMPT) model at the kinetic freezeout. In the low transverse momentum ($p_T$) region $p_T/A < 1.5$ GeV/$c$, where $A$ is the mass number of a nucleus, the in-jet coalescence factor $B_2^\text{In-jet}$ for deuteron production, given by the ratio of the in-jet deuteron number to the square of the in-jet proton number, is found to be larger than the coalescence factor $B_2$ in the medium perpendicular to the jet by a factor of about 10 in $pp$ collisions and of 25 in $p-$Pb collisions, which are consistent with the ALICE measurements at the LHC. Such large low-momentum enhancements mainly come from coalescence of nucleons inside the jet with the medium nucleons. Coalescence of nucleons inside the jet dominates deuteron production only at the higher $p_T$ region of $p_T/A\gtrsim 4$ GeV/$c$, where both the yield ratio $d/p$ of deuteron to proton numbers and the $B_2$ are also significantly larger in the jet direction than in the direction perpendicular to the jet due to the strong collinear correlation among particles produced from jet fragmentation. Studying jet-associated deuteron production in relativistic nuclear collisions thus opens up a new window to probe the phase-space structure of nucleons inside jets.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.