-
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers
Authors:
Longkun Zou,
Wanru Zhu,
Ke Chen,
Lihua Guo,
Kailing Guo,
Kui Jia,
Yaowei Wang
Abstract:
Semantic pattern of an object point cloud is determined by its topological configuration of local geometries. Learning discriminative representations can be challenging due to large shape variations of point sets in local regions and incomplete surface in a global perspective, which can be made even more severe in the context of unsupervised domain adaptation (UDA). In specific, traditional 3D net…
▽ More
Semantic pattern of an object point cloud is determined by its topological configuration of local geometries. Learning discriminative representations can be challenging due to large shape variations of point sets in local regions and incomplete surface in a global perspective, which can be made even more severe in the context of unsupervised domain adaptation (UDA). In specific, traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries, which greatly limits their cross-domain generalization. Recently, the transformer-based models have achieved impressive performance gain in a range of image-based tasks, benefiting from its strong generalization capability and scalability stemming from capturing long range correlation across local patches. Inspired by such successes of visual transformers, we propose a novel Relational Priors Distillation (RPD) method to extract relational priors from the well-trained transformers on massive images, which can significantly empower cross-domain representations with consistent topological priors of objects. To this end, we establish a parameter-frozen pre-trained transformer module shared between 2D teacher and 3D student models, complemented by an online knowledge distillation strategy for semantically regularizing the 3D student model. Furthermore, we introduce a novel self-supervised task centered on reconstructing masked point cloud patches using corresponding masked multi-view image features, thereby empowering the model with incorporating 3D geometric information. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification. The source code of this work is available at https://github.com/zou-longkun/RPD.git.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Large Language Model Enabled Semantic Communication Systems
Authors:
Zhenyi Wang,
Li Zou,
Shengyun Wei,
Feifan Liao,
Jia Zhuo,
Haibo Mi,
Rongxuan Lai
Abstract:
Large language models (LLMs) have recently demonstrated state-of-the-art performance across various natural language processing (NLP) tasks, achieving near-human levels in multiple language understanding challenges and aligning closely with the core principles of semantic communication. Inspired by LLMs' advancements in semantic processing, we propose an innovative LLM-enabled semantic communicati…
▽ More
Large language models (LLMs) have recently demonstrated state-of-the-art performance across various natural language processing (NLP) tasks, achieving near-human levels in multiple language understanding challenges and aligning closely with the core principles of semantic communication. Inspired by LLMs' advancements in semantic processing, we propose an innovative LLM-enabled semantic communication system framework, named LLM-SC, that applies LLMs directly to the physical layer coding and decoding for the first time. By analyzing the relationship between the training process of LLMs and the optimization objectives of semantic communication, we propose training a semantic encoder through LLMs' tokenizer training and establishing a semantic knowledge base via the LLMs' unsupervised pre-training process. This knowledge base aids in constructing the optimal decoder by providing the prior probability of the transmitted language sequence. Based on this foundation, we derive the optimal decoding criterion for the receiver and introduce the beam search algorithm to further reduce the complexity. Furthermore, we assert that existing LLMs can be employed directly for LLM-SC without additional re-training or fine-tuning. Simulation results demonstrate that LLM-SC outperforms classical DeepSC at signal-to-noise ratios (SNR) exceeding 3 dB, enabling error-free transmission of semantic information under high SNR, which is unattainable by DeepSC. In addition to semantic-level performance, LLM-SC demonstrates compatibility with technical-level performance, achieving approximately 8 dB coding gain for a bit error ratio (BER) of $10^{-3}$ without any channel coding while maintaining the same joint source-channel coding rate as traditional communication systems.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
Few-Shot Bioacoustic Event Detection with Frame-Level Embedding Learning System
Authors:
PengYuan Zhao,
ChengWei Lu,
Liang Zou
Abstract:
This technical report presents our frame-level embedding learning system for the DCASE2024 challenge for few-shot bioacoustic event detection (Task 5).In this work, we used log-mel and PCEN for feature extraction of the input audio, Netmamba Encoder as the information interaction network, and adopted data augmentation strategies to improve the generalizability of the trained model as well as multi…
▽ More
This technical report presents our frame-level embedding learning system for the DCASE2024 challenge for few-shot bioacoustic event detection (Task 5).In this work, we used log-mel and PCEN for feature extraction of the input audio, Netmamba Encoder as the information interaction network, and adopted data augmentation strategies to improve the generalizability of the trained model as well as multiple post-processing methods. Our final system achieved an F-measure score of 56.4%, securing the 2nd rank in the few-shot bioacoustic event detection category of the Detection and Classification of Acoustic Scenes and Events Challenge 2024.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Generalized Gouy Rotation of Electron Vortex beams in uniform magnetic fields
Authors:
Qi Meng,
Xuan Liu,
Wei Ma,
Zhen Yang,
Liang Lu,
Alexander J. Silenko,
Pengming Zhang,
Liping Zou
Abstract:
The rotation of electron vortex beams (EVBs) presents a complex interplay of the Gouy phase characterizing free-space behavior and Landau states or Larmor rotation observed in magnetic fields. Despite being studied separately, these phenomena manifest within a single beam during its propagation in magnetic fields, lacking a comprehensive description. We address this by utilizing exact solutions of…
▽ More
The rotation of electron vortex beams (EVBs) presents a complex interplay of the Gouy phase characterizing free-space behavior and Landau states or Larmor rotation observed in magnetic fields. Despite being studied separately, these phenomena manifest within a single beam during its propagation in magnetic fields, lacking a comprehensive description. We address this by utilizing exact solutions of the relativistic paraxial equation in magnetic fields, termed "paraxial Landau modes". The paraxial Landau modes describe the quantum states of EVBs in magnetic fields. Our study of rotation angles demonstrates consistency with experimental data, supporting the practical presence of these modes. We provide a unified description of different regimes under generalized Gouy rotation, linking the Gouy phase to EVB rotation angles. This connection enhances our understanding of the Gouy phase and can be extended to nonuniform magnetic fields. Our theoretical analysis is validated through numerical simulations using the Chebyshev method. This work offers new insights into the dynamics of EVBs in magnetic fields and suggests practical applications in beam manipulation and beam optics of vortex particles.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Efficient Sparse Attention needs Adaptive Token Release
Authors:
Chaoran Zhang,
Lixin Zou,
Dan Luo,
Min Tang,
Xiangyang Luo,
Zihao Li,
Chenliang Li
Abstract:
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide array of text-centric tasks. However, their `large' scale introduces significant computational and storage challenges, particularly in managing the key-value states of the transformer, which limits their wider applicability. Therefore, we propose to adaptively release resources from caches and reb…
▽ More
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide array of text-centric tasks. However, their `large' scale introduces significant computational and storage challenges, particularly in managing the key-value states of the transformer, which limits their wider applicability. Therefore, we propose to adaptively release resources from caches and rebuild the necessary key-value states. Particularly, we accomplish this by a lightweight controller module to approximate an ideal top-$K$ sparse attention. This module retains the tokens with the highest top-$K$ attention weights and simultaneously rebuilds the discarded but necessary tokens, which may become essential for future decoding. Comprehensive experiments in natural language generation and modeling reveal that our method is not only competitive with full attention in terms of performance but also achieves a significant throughput improvement of up to 221.8%. The code for replication is available on the https://github.com/WHUIR/ADORE.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Combining Optimal Transport and Embedding-Based Approaches for More Expressiveness in Unsupervised Graph Alignment
Authors:
Songyang Chen,
Yu Liu,
Lei Zou,
Zexuan Wang,
Youfang Lin,
Yuxing Chen,
Anqun Pan
Abstract:
Unsupervised graph alignment finds the one-to-one node correspondence between a pair of attributed graphs by only exploiting graph structure and node features. One category of existing works first computes the node representation and then matches nodes with close embeddings, which is intuitive but lacks a clear objective tailored for graph alignment in the unsupervised setting. The other category…
▽ More
Unsupervised graph alignment finds the one-to-one node correspondence between a pair of attributed graphs by only exploiting graph structure and node features. One category of existing works first computes the node representation and then matches nodes with close embeddings, which is intuitive but lacks a clear objective tailored for graph alignment in the unsupervised setting. The other category reduces the problem to optimal transport (OT) via Gromov-Wasserstein (GW) learning with a well-defined objective but leaves a large room for exploring the design of transport cost. We propose a principled approach to combine their advantages motivated by theoretical analysis of model expressiveness. By noticing the limitation of discriminative power in separating matched and unmatched node pairs, we improve the cost design of GW learning with feature transformation, which enables feature interaction across dimensions. Besides, we propose a simple yet effective embedding-based heuristic inspired by the Weisfeiler-Lehman test and add its prior knowledge to OT for more expressiveness when handling non-Euclidean data. Moreover, we are the first to guarantee the one-to-one matching constraint by reducing the problem to maximum weight matching. The algorithm design effectively combines our OT and embedding-based predictions via stacking, an ensemble learning strategy. We propose a model framework named \texttt{CombAlign} integrating all the above modules to refine node alignment progressively. Through extensive experiments, we demonstrate significant improvements in alignment accuracy compared to state-of-the-art approaches and validate the effectiveness of the proposed modules.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
S. Afanasiev,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
H. Al-Bataineh,
J. Alexander,
M. Alfred,
K. Aoki,
N. Apadula,
L. Aphecetche,
J. Asai,
H. Asano,
E. T. Atomssa,
R. Averbeck,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
G. Baksay,
L. Baksay,
A. Baldisseri
, et al. (510 additional authors not shown)
Abstract:
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs…
▽ More
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Lieb-Schultz-Mattis theorems and generalizations in long-range interacting systems
Authors:
Ruizhi Liu,
Jinmin Yi,
Shiyu Zhou,
Liujun Zou
Abstract:
In a unified fashion, we establish Lieb-Schultz-Mattis (LSM) theorems and their generalizations in systems with long-range interactions. We show that, for a quantum spin chain, if the interactions decay fast enough as their ranges increase and the Hamiltonian has an anomalous symmetry, the Hamiltonian cannot have a unique gapped symmetric ground state. If the Hamiltonian contains only 2-spin inter…
▽ More
In a unified fashion, we establish Lieb-Schultz-Mattis (LSM) theorems and their generalizations in systems with long-range interactions. We show that, for a quantum spin chain, if the interactions decay fast enough as their ranges increase and the Hamiltonian has an anomalous symmetry, the Hamiltonian cannot have a unique gapped symmetric ground state. If the Hamiltonian contains only 2-spin interactions, these theorems hold when the interactions decay faster than $1/r^2$, with $r$ the distance between the two interacting spins. Moreover, any pure state with an anomalous symmetry, which may not be a ground state of any natural Hamiltonian, must be long-range entangled. The symmetries we consider include on-site internal symmetries combined with lattice translation symmetries, and they can also extend to purely internal but non-on-site symmetries. Moreover, these internal symmetries can be discrete or continuous. We explore the applications of the theorems through various examples.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Anomalous Phonon in Charge-Density-Wave Phase of Kagome Metal CsV3Sb5
Authors:
Han-Yu Wang,
Xiao-Cheng Bai,
Wen-Feng Wu,
Zhi Zeng,
Da-Yong Liu,
Liang-Jian Zou
Abstract:
CsV3Sb5, a notable compound within the kagome family, is renowned for its topological and superconducting properties, as well as its detection of local magnetic field and anomalous Hall effect in experiments. However, the origin of this local magnetic field is still veiled. In this study, we employ the first-principles calculations to investigate the atomic vibration in both the pristine and the c…
▽ More
CsV3Sb5, a notable compound within the kagome family, is renowned for its topological and superconducting properties, as well as its detection of local magnetic field and anomalous Hall effect in experiments. However, the origin of this local magnetic field is still veiled. In this study, we employ the first-principles calculations to investigate the atomic vibration in both the pristine and the charge-density-wave phases of CsV$_3$Sb$_5$. Our analysis reveals the presence of ``anomalous phonons" in these structures, these phonon induce the circular vibration of atoms, contributing to the phonon magnetic moments and subsequently to the observed the local magnetic fields. Additionally, we observe that lattice distortion in the charge-density-wave phase amplifies these circular vibrations, resulting in a stronger local magnetic field, particularly from the vanadium atoms. This investigation not only reveals the potential relation between lattice distortion and atomic polarization but also offers a novel idea to understand the origin of local magnetic moment in CsV3Sb5.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Leveraging Large Language Model as Simulated Patients for Clinical Education
Authors:
Yanzeng Li,
Cheng Zeng,
Jialun Zhong,
Ruoyu Zhang,
Minhao Zhang,
Lei Zou
Abstract:
Simulated Patients (SPs) play a crucial role in clinical medical education by providing realistic scenarios for student practice. However, the high cost of training and hiring qualified SPs, along with the heavy workload and potential risks they face in consistently portraying actual patients, limit students' access to this type of clinical training. Consequently, the integration of computer progr…
▽ More
Simulated Patients (SPs) play a crucial role in clinical medical education by providing realistic scenarios for student practice. However, the high cost of training and hiring qualified SPs, along with the heavy workload and potential risks they face in consistently portraying actual patients, limit students' access to this type of clinical training. Consequently, the integration of computer program-based simulated patients has emerged as a valuable educational tool in recent years. With the rapid development of Large Language Models (LLMs), their exceptional capabilities in conversational artificial intelligence and role-playing have been demonstrated, making them a feasible option for implementing Virtual Simulated Patient (VSP). In this paper, we present an integrated model-agnostic framework called CureFun that harnesses the potential of LLMs in clinical medical education. This framework facilitates natural conversations between students and simulated patients, evaluates their dialogue, and provides suggestions to enhance students' clinical inquiry skills. Through comprehensive evaluations, our approach demonstrates more authentic and professional SP-scenario dialogue flows compared to other LLM-based chatbots, thus proving its proficiency in simulating patients. Additionally, leveraging CureFun's evaluation ability, we assess several medical LLMs and discuss the possibilities and limitations of using LLMs as virtual doctors from the perspective of their diagnostic abilities.
△ Less
Submitted 24 April, 2024; v1 submitted 13 April, 2024;
originally announced April 2024.
-
Nucleon microscopy in proton-nucleus scattering via analysis of bremsstrahlung emission: role of incoherent emission
Authors:
Sergei P. Maydanyuk,
Li-Ping Zou,
Peng-Ming Zhang
Abstract:
We study electromagnetic form factors of protons in proton-nucleus scattering via analysing of experimental cross-sections of accompanying bremsstrahlung photons. A new bremsstrahlung model for proton-nucleus scattering is developed, where a main focus is given on incoherent bremsstrahlung that has not been considered previously. In analysis we choose experimental bremsstrahlung data of $p$ +…
▽ More
We study electromagnetic form factors of protons in proton-nucleus scattering via analysing of experimental cross-sections of accompanying bremsstrahlung photons. A new bremsstrahlung model for proton-nucleus scattering is developed, where a main focus is given on incoherent bremsstrahlung that has not been considered previously. In analysis we choose experimental bremsstrahlung data of $p$ + $^{197}$Au scattering at proton beam energy of 190 MeV obtained by TAPS collaboration. We find the following. (1) Inclusion of incoherent emission to calculations improves agreements with experimental data essentially, contribution of incoherent bremsstrahlung is essentially larger than coherent one. (2) Inclusion of form factors of the scattered proton improves agreement with experimental data in comparison with calculations with coherent and incoherent contributions without form factors. (3) Sensitivity of model in study of form factors of the scattered proton is high. This demonstrates a new opportunity to study internal structure of protons under influence of nuclear forces in nuclear scattering.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Understanding Human-COVID-19 Dynamics using Geospatial Big Data: A Systematic Literature Review
Authors:
Binbin Lin,
Lei Zou,
Mingzheng Yang,
Bing Zhou,
Debayan Mandal,
Joynal Abedin,
Heng Cai,
Ning Ning
Abstract:
The COVID-19 pandemic has changed human life. To mitigate the pandemic's impacts, different regions implemented various policies to contain COVID-19 and residents showed diverse responses. These human responses in turn shaped the uneven spatial-temporal spread of COVID-19. Consequently, the human-pandemic interaction is complex, dynamic, and interconnected. Delineating the reciprocal effects betwe…
▽ More
The COVID-19 pandemic has changed human life. To mitigate the pandemic's impacts, different regions implemented various policies to contain COVID-19 and residents showed diverse responses. These human responses in turn shaped the uneven spatial-temporal spread of COVID-19. Consequently, the human-pandemic interaction is complex, dynamic, and interconnected. Delineating the reciprocal effects between human society and the pandemic is imperative for mitigating risks from future epidemics. Geospatial big data acquired through mobile applications and sensor networks have facilitated near-real-time tracking and assessment of human responses to the pandemic, enabling a surge in researching human-pandemic interactions. However, these investigations involve inconsistent data sources, human activity indicators, relationship detection models, and analysis methods, leading to a fragmented understanding of human-pandemic dynamics. To assess the current state of human-pandemic interactions research, we conducted a synthesis study based on 67 selected publications between March 2020 and January 2023. We extracted key information from each article across six categories, e.g., research area and time, data, methodological framework, and results and conclusions. Results reveal that regression models were predominant in relationship detection, featured in 67.16% of papers. Only two papers employed spatial-temporal models, notably underrepresented in the existing literature. Studies examining the effects of policies and human mobility on the pandemic's health impacts were the most prevalent, each comprising 12 articles (17.91%). Only 3 papers (4.48%) delved into bidirectional interactions between human responses and the COVID-19 spread. These findings shed light on the need for future research to spatially and temporally model the long-term, bidirectional causal relationships within human-pandemic systems.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
PRIME: A CyberGIS Platform for Resilience Inference Measurement and Enhancement
Authors:
Debayan Mandal,
Dr. Lei Zou,
Rohan Singh Wilkho,
Joynal Abedin,
Bing Zhou,
Dr. Heng Cai,
Dr. Furqan Baig,
Dr. Nasir Gharaibeh,
Dr. Nina Lam
Abstract:
In an era of increased climatic disasters, there is an urgent need to develop reliable frameworks and tools for evaluating and improving community resilience to climatic hazards at multiple geographical and temporal scales. Defining and quantifying resilience in the social domain is relatively subjective due to the intricate interplay of socioeconomic factors with disaster resilience. Meanwhile, t…
▽ More
In an era of increased climatic disasters, there is an urgent need to develop reliable frameworks and tools for evaluating and improving community resilience to climatic hazards at multiple geographical and temporal scales. Defining and quantifying resilience in the social domain is relatively subjective due to the intricate interplay of socioeconomic factors with disaster resilience. Meanwhile, there is a lack of computationally rigorous, user-friendly tools that can support customized resilience assessment considering local conditions. This study aims to address these gaps through the power of CyberGIS with three objectives: 1) To develop an empirically validated disaster resilience model - Customized Resilience Inference Measurement designed for multi-scale community resilience assessment and influential socioeconomic factors identification, 2) To implement a Platform for Resilience Inference Measurement and Enhancement module in the CyberGISX platform backed by high-performance computing, 3) To demonstrate the utility of PRIME through a representative study. CRIM generates vulnerability, adaptability, and overall resilience scores derived from empirical hazard parameters. Computationally intensive Machine Learning methods are employed to explain the intricate relationships between these scores and socioeconomic driving factors. PRIME provides a web-based notebook interface guiding users to select study areas, configure parameters, calculate and geo-visualize resilience scores, and interpret socioeconomic factors shaping resilience capacities. A representative study showcases the efficiency of the platform while explaining how the visual results obtained may be interpreted. The essence of this work lies in its comprehensive architecture that encapsulates the requisite data, analytical and geo-visualization functions, and ML models for resilience assessment.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
The magnetism measurements of the two-dimensional van der Waals antiferromagnet CrPS4 using dynamic cantilever magnetometry
Authors:
Qi Li,
Weili Zhen,
Ning Wang,
Yang Yu,
Senyang Pan,
Lin Deng,
Jiaqiang Cai,
Kang Wang,
Lvkuan Zou,
Zhongming Zeng,
Jinglei Zhang,
Haifeng Du
Abstract:
The exploration of van der Waals (vdWs) magnetic materials has sparked great interest in spintronics. However, conventional methods often face challenges in characterizing the magnetic properties of small-sized vdWs materials, especially for antiferromagnets with extremely small magnetic moments. Here, we demonstrate the efficacy of dynamic cantilever magnetometry (DCM) in characterizing the magne…
▽ More
The exploration of van der Waals (vdWs) magnetic materials has sparked great interest in spintronics. However, conventional methods often face challenges in characterizing the magnetic properties of small-sized vdWs materials, especially for antiferromagnets with extremely small magnetic moments. Here, we demonstrate the efficacy of dynamic cantilever magnetometry (DCM) in characterizing the magnetic properties of vdWs magnets, using an antiferromagnetic semiconductor CrPS4. We observe continuous spin axis rotation under a magnetic field, accurately modelled by considering the existance of marked magnetic anisotropies. Furthermore, the dominance of out-of-plane magnetic anisotropy in spin reorientation behavior at low temperatures transitions to the prevalence of in-plane anisotropy with increasing temperature, leading to a sign reversal of the frequency shift in measurements. The peculiar magnetic phase transitions make CrPS4 an intriguing platform for studying two-dimensional magnetism. Our findings underscore the effectiveness of DCM in characterizing magnetic anisotropies and phase transitions in vdWs magnets.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
A Multi-Level Framework for Accelerating Training Transformer Models
Authors:
Longwei Zou,
Han Zhang,
Yangdong Deng
Abstract:
The fast growing capabilities of large-scale deep learning models, such as Bert, GPT and ViT, are revolutionizing the landscape of NLP, CV and many other domains. Training such models, however, poses an unprecedented demand for computing power, which incurs exponentially increasing energy cost and carbon dioxide emissions. It is thus critical to develop efficient training solutions to reduce the t…
▽ More
The fast growing capabilities of large-scale deep learning models, such as Bert, GPT and ViT, are revolutionizing the landscape of NLP, CV and many other domains. Training such models, however, poses an unprecedented demand for computing power, which incurs exponentially increasing energy cost and carbon dioxide emissions. It is thus critical to develop efficient training solutions to reduce the training costs. Motivated by a set of key observations of inter- and intra-layer similarities among feature maps and attentions that can be identified from typical training processes, we propose a multi-level framework for training acceleration. Specifically, the framework is based on three basic operators, Coalescing, De-coalescing and Interpolation, which can be orchestrated to build a multi-level training framework. The framework consists of a V-cycle training process, which progressively down- and up-scales the model size and projects the parameters between adjacent levels of models via coalescing and de-coalescing. The key idea is that a smaller model that can be trained for fast convergence and the trained parameters provides high-qualities intermediate solutions for the next level larger network. The interpolation operator is designed to break the symmetry of neurons incurred by de-coalescing for better convergence performance. Our experiments on transformer-based language models (e.g. Bert, GPT) as well as a vision model (e.g. DeiT) prove that the proposed framework reduces the computational cost by about 20% on training BERT/GPT-Base models and up to 51.6% on training the BERT-Large model while preserving the performance.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
I-mode Plasma Confinement Improvement by Real-time Lithium Injection and its Classification on EAST Tokamak
Authors:
X. M. Zhong,
X. L. Zou,
A. D. Liu,
Y. T. Song,
G. Zhuang,
H. Q. Liu,
L. Q. Xu,
E. Z. Li,
B. Zhang,
G. Z. Zuo,
Z. Wang,
C. Zhou,
J. Zhang,
W. X. Shi,
L. T. Gao,
S. F. Wang,
W. Gao,
T. Q. Jia,
Q. Zang,
H. L. Zhao,
M. Wang,
H. D. Xu,
X. J. Wang,
X. Gao,
X. D. Lin
, et al. (3 additional authors not shown)
Abstract:
I-mode is a promising regime for future fusion reactors due to the high energy confinement and the moderate particle confinement. However, the effect of lithium, which has been widely applied for particle recycling and impurity control, on I-mode plasma is still unclear. Recently, experiments of real-time lithium powder injection on I-mode plasma have been carried out in EAST Tokamak. It was found…
▽ More
I-mode is a promising regime for future fusion reactors due to the high energy confinement and the moderate particle confinement. However, the effect of lithium, which has been widely applied for particle recycling and impurity control, on I-mode plasma is still unclear. Recently, experiments of real-time lithium powder injection on I-mode plasma have been carried out in EAST Tokamak. It was found that the confinement performance of the I-mode can be improved by the lithium powder injection, which can strongly reduce electron turbulence (ET) and then trigger ion turbulence (IT). Four different regimes of I-mode have been identified in EAST. The Type I I-mode plasma is characterized by the weakly coherent mode (WCM) and the geodesic-acoustic mode (GAM). The Type II I-mode is featured as the WCM and the edge temperature ring oscillation (ETRO). The Type III I-mode corresponds to the plasma with the co-existence of ETRO, GAM, and WCM. The Type IV I-mode denotes the plasma with only WCM but without ETRO and GAM. It has been observed that WCM and ETRO are increased with lithium powder injection due to the reduction of ion and electron turbulence, and the enhancement of the pedestal electron temperature gradient. EAST experiments demonstrate that lithium powder injection is an effective tool for real-time control and confinement improvement of I-mode plasma.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers
Authors:
Longwei Zou,
Qingyang Wang,
Han Zhao,
Jiangang Kong,
Yi Yang,
Yangdong Deng
Abstract:
The fast-growing large scale language models are delivering unprecedented performance on almost all natural language processing tasks. However, the effectiveness of large language models are reliant on an exponentially increasing number of parameters. The overwhelming computation complexity incurs a high inference latency that negatively affects user experience. Existing methods to improve inferen…
▽ More
The fast-growing large scale language models are delivering unprecedented performance on almost all natural language processing tasks. However, the effectiveness of large language models are reliant on an exponentially increasing number of parameters. The overwhelming computation complexity incurs a high inference latency that negatively affects user experience. Existing methods to improve inference efficiency, such as tensor parallelism and quantization, target to reduce per-layer computing latency, yet overlook the cumulative latency due to the number of layers. Recent works on reducing the cumulative latency through layer removing, however, lead to significant performance drop. Motivated by the similarity of inputs among adjacent layers, we propose to identify quasi-independent layers, which can be concurrently computed to significantly decrease inference latency. We also introduce a bypassing technique to mitigate the effect of information loss. Empirical experiments of the proposed approach on the LLaMA models confirm that Concurrent Computation of Quasi-Independent Layers (CQIL) can reduce latency by up to 48.3% on LLaMA-33B, while maintaining a close level of performance.
△ Less
Submitted 4 July, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Pressure-tuning topological phase transitions in Kagome superconductor CsTi$_3$Bi$_5$
Authors:
Wenfeng Wu,
Xiaocheng Bai,
Xianlong Wang,
Dayong Liu,
Zhi Zeng,
Liangjian Zou
Abstract:
Recently, the Kagome metal CsTi$_3$Bi$_5$ has exhibited several novel quantum properties similar to CsV$_3$Sb$_5$, such as nontrivial topology, double-dome superconductivity, and flat band features. However, CsTi$_3$Bi$_5$ lacks the charge-density wave (CDW) present in CsV$_3$Sb$_5$, making the study of its emergence of double-dome superconductivity a focus of research. In this work, we have ident…
▽ More
Recently, the Kagome metal CsTi$_3$Bi$_5$ has exhibited several novel quantum properties similar to CsV$_3$Sb$_5$, such as nontrivial topology, double-dome superconductivity, and flat band features. However, CsTi$_3$Bi$_5$ lacks the charge-density wave (CDW) present in CsV$_3$Sb$_5$, making the study of its emergence of double-dome superconductivity a focus of research. In this work, we have identified an order parameter, the three-band Z$_2$ topological index, that can describe the superconducting phase diagram of CsTi$_3$Bi$_5$ under pressure. Its evolution with pressure follows the expected behavior for superconductivity. Furthermore, the results of the Fermi surface under pressure reveal the potential presence of a Lifshitz transition in the vicinity of the vanishing point of the superconducting temperature change with pressure in CsTi$_3$Bi$_5$. These results indicate that the superconducting behavior of CsTi$_3$Bi$_5$ under pressure is caused by changes in the electronic structure leading to alterations in the topological properties, provide new insights and approaches for understanding the superconducting phenomenon in Kagome metals.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Kernel Multigrid: Accelerate Back-fitting via Sparse Gaussian Process Regression
Authors:
Lu Zou,
Liang Ding
Abstract:
Additive Gaussian Processes (GPs) are popular approaches for nonparametric feature selection. The common training method for these models is Bayesian Back-fitting. However, the convergence rate of Back-fitting in training additive GPs is still an open problem. By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than…
▽ More
Additive Gaussian Processes (GPs) are popular approaches for nonparametric feature selection. The common training method for these models is Bayesian Back-fitting. However, the convergence rate of Back-fitting in training additive GPs is still an open problem. By utilizing a technique called Kernel Packets (KP), we prove that the convergence rate of Back-fitting is no faster than $(1-\mathcal{O}(\frac{1}{n}))^t$, where $n$ and $t$ denote the data size and the iteration number, respectively. Consequently, Back-fitting requires a minimum of $\mathcal{O}(n\log n)$ iterations to achieve convergence. Based on KPs, we further propose an algorithm called Kernel Multigrid (KMG). This algorithm enhances Back-fitting by incorporating a sparse Gaussian Process Regression (GPR) to process the residuals after each Back-fitting iteration. It is applicable to additive GPs with both structured and scattered data. Theoretically, we prove that KMG reduces the required iterations to $\mathcal{O}(\log n)$ while preserving the time and space complexities at $\mathcal{O}(n\log n)$ and $\mathcal{O}(n)$ per iteration, respectively. Numerically, by employing a sparse GPR with merely 10 inducing points, KMG can produce accurate approximations of high-dimensional targets within 5 iterations.
△ Less
Submitted 30 March, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Multitask frame-level learning for few-shot sound event detection
Authors:
Liang Zou,
Genwei Yan,
Ruoyu Wang,
Jun Du,
Meng Lei,
Tian Gao,
Xin Fang
Abstract:
This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been…
▽ More
This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been proposed to overcome these limitations, these strategies commonly face difficulties with prediction truncation caused by background noise. To alleviate this issue, we introduces an innovative multitask frame-level SED framework. In addition, we introduce TimeFilterAug, a linear timing mask for data augmentation, to increase the model's robustness and adaptability to diverse acoustic environments. The proposed method achieves a F-score of 63.8%, securing the 1st rank in the few-shot bioacoustic event detection category of the Detection and Classification of Acoustic Scenes and Events Challenge 2023.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Asymptotic Theory for Linear Functionals of Kernel Ridge Regression
Authors:
Rui Tuo,
Lu Zou
Abstract:
An asymptotic theory is established for linear functionals of the predictive function given by kernel ridge regression, when the reproducing kernel Hilbert space is equivalent to a Sobolev space. The theory covers a wide variety of linear functionals, including point evaluations, evaluation of derivatives, $L_2$ inner products, etc. We establish the upper and lower bounds of the estimates and thei…
▽ More
An asymptotic theory is established for linear functionals of the predictive function given by kernel ridge regression, when the reproducing kernel Hilbert space is equivalent to a Sobolev space. The theory covers a wide variety of linear functionals, including point evaluations, evaluation of derivatives, $L_2$ inner products, etc. We establish the upper and lower bounds of the estimates and their asymptotic normality. It is shown that $λ\sim n^{-1}$ is the universal optimal order of magnitude for the smoothing parameter to balance the variance and the worst-case bias. The theory also implies that the optimal $L_\infty$ error of kernel ridge regression can be attained under the optimal smoothing parameter $λ\sim n^{-1}\log n$. These optimal rates for the smoothing parameter differ from the known optimal rate $λ\sim n^{-\frac{2m}{2m+d}}$ that minimizes the $L_2$ error of the kernel ridge regression.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Metasurface spectrometers beyond resolution-sensitivity constraints
Authors:
Feng Tang,
Jingjun Wu,
Tom Albrow-Owen,
Hanxiao Cui,
Fujia Chen,
Yaqi Shi,
Lan Zou,
Jun Chen,
Xuhan Guo,
Yijun Sun,
Jikui Luo,
Bingfeng Ju,
Jing Huang,
Shuangli Liu,
Bo Li,
Liming Yang,
Eric Anthony Munro,
Wanguo Zheng,
Hannah J. Joyce,
Hongsheng Chen,
Lufeng Che,
Shurong Dong,
Tawfique Hasan,
Xin Ye,
Yihao Yang
, et al. (1 additional authors not shown)
Abstract:
Optical spectroscopy plays an essential role across scientific research and industry for non-contact materials analysis1-3, increasingly through in-situ or portable platforms4-6. However, when considering low-light-level applications, conventional spectrometer designs necessitate a compromise between their resolution and sensitivity7,8, especially as device and detector dimensions are scaled down.…
▽ More
Optical spectroscopy plays an essential role across scientific research and industry for non-contact materials analysis1-3, increasingly through in-situ or portable platforms4-6. However, when considering low-light-level applications, conventional spectrometer designs necessitate a compromise between their resolution and sensitivity7,8, especially as device and detector dimensions are scaled down. Here, we report on a miniaturizable spectrometer platform where light throughput onto the detector is instead enhanced as the resolution is increased. This planar, CMOS-compatible platform is based around metasurface encoders designed to exhibit photonic bound states in the continuum9, where operational range can be altered or extended simply through adjusting geometric parameters. This system can enhance photon collection efficiency by up to two orders of magnitude versus conventional designs; we demonstrate this sensitivity advantage through ultra-low-intensity fluorescent and astrophotonic spectroscopy. This work represents a step forward for the practical utility of spectrometers, affording a route to integrated, chip-based devices that maintain high resolution and SNR without requiring prohibitively long integration times.
△ Less
Submitted 1 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Authors:
Ziheng Jiang,
Haibin Lin,
Yinmin Zhong,
Qi Huang,
Yangrui Chen,
Zhi Zhang,
Yanghua Peng,
Xiang Li,
Cong Xie,
Shibiao Nong,
Yulu Jia,
Sun He,
Hongmin Chen,
Zhihao Bai,
Qi Hou,
Shipeng Yan,
Ding Zhou,
Yiyao Sheng,
Zhuo Jiang,
Haohan Xu,
Haoran Wei,
Zhang Zhang,
Pengfei Nie,
Leqi Zou,
Sida Zhao
, et al. (7 additional authors not shown)
Abstract:
We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPUs. Training LLMs at this scale brings unprecedented challenges to training efficiency and stability. We take a full-stack approach that co-designs the algorithmic and system components across model bl…
▽ More
We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPUs. Training LLMs at this scale brings unprecedented challenges to training efficiency and stability. We take a full-stack approach that co-designs the algorithmic and system components across model block and optimizer design, computation and communication overlapping, operator optimization, data pipeline, and network performance tuning. Maintaining high efficiency throughout the training process (i.e., stability) is an important consideration in production given the long extent of LLM training jobs. Many hard stability issues only emerge at large scale, and in-depth observability is the key to address them. We develop a set of diagnosis tools to monitor system components and events deep in the stack, identify root causes, and derive effective techniques to achieve fault tolerance and mitigate stragglers. MegaScale achieves 55.2% Model FLOPs Utilization (MFU) when training a 175B LLM model on 12,288 GPUs, improving the MFU by 1.34x compared to Megatron-LM. We share our operational experience in identifying and fixing failures and stragglers. We hope by articulating the problems and sharing our experience from a systems perspective, this work can inspire future LLM systems research.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Evolutionary Reinforcement Learning: A Systematic Review and Future Directions
Authors:
Yuanguo Lin,
Fan Lin,
Guorong Cai,
Hong Chen,
Lixin Zou,
Pengcheng Wu
Abstract:
In response to the limitations of reinforcement learning and evolutionary algorithms (EAs) in complex problem-solving, Evolutionary Reinforcement Learning (EvoRL) has emerged as a synergistic solution. EvoRL integrates EAs and reinforcement learning, presenting a promising avenue for training intelligent agents. This systematic review firstly navigates through the technological background of EvoRL…
▽ More
In response to the limitations of reinforcement learning and evolutionary algorithms (EAs) in complex problem-solving, Evolutionary Reinforcement Learning (EvoRL) has emerged as a synergistic solution. EvoRL integrates EAs and reinforcement learning, presenting a promising avenue for training intelligent agents. This systematic review firstly navigates through the technological background of EvoRL, examining the symbiotic relationship between EAs and reinforcement learning algorithms. We then delve into the challenges faced by both EAs and reinforcement learning, exploring their interplay and impact on the efficacy of EvoRL. Furthermore, the review underscores the need for addressing open issues related to scalability, adaptability, sample efficiency, adversarial robustness, ethic and fairness within the current landscape of EvoRL. Finally, we propose future directions for EvoRL, emphasizing research avenues that strive to enhance self-adaptation and self-improvement, generalization, interpretability, explainability, and so on. Serving as a comprehensive resource for researchers and practitioners, this systematic review provides insights into the current state of EvoRL and offers a guide for advancing its capabilities in the ever-evolving landscape of artificial intelligence.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Nuclear mass table in deformed relativistic Hartree-Bogoliubov theory in continuum, II: Even-$Z$ nuclei
Authors:
DRHBc Mass Table Collaboration,
Peng Guo,
Xiaojie Cao,
Kangmin Chen,
Zhihui Chen,
Myung-Ki Cheoun,
Yong-Beom Choi,
Pak Chung Lam,
Wenmin Deng,
Jianmin Dong,
Pengxiang Du,
Xiaokai Du,
Kangda Duan,
Xiaohua Fan,
Wei Gao,
Lisheng Geng,
Eunja Ha,
Xiao-Tao He,
Jinniu Hu,
Jingke Huang,
Kun Huang,
Yanan Huang,
Zidan Huang,
Kim Da Hyung,
Hoi Yat Chan
, et al. (58 additional authors not shown)
Abstract:
The mass table in the deformed relativistic Hartree-Bogoliubov theory in continuum (DRHBc) with the PC-PK1 density functional has been established for even-$Z$ nuclei with $8\le Z\le120$, extended from the previous work for even-even nuclei [Zhang $\it{et.~al.}$ (DRHBc Mass Table Collaboration), At. Data Nucl. Data Tables 144, 101488 (2022)]. The calculated binding energies, two-nucleon and one-ne…
▽ More
The mass table in the deformed relativistic Hartree-Bogoliubov theory in continuum (DRHBc) with the PC-PK1 density functional has been established for even-$Z$ nuclei with $8\le Z\le120$, extended from the previous work for even-even nuclei [Zhang $\it{et.~al.}$ (DRHBc Mass Table Collaboration), At. Data Nucl. Data Tables 144, 101488 (2022)]. The calculated binding energies, two-nucleon and one-neutron separation energies, root-mean-square (rms) radii of neutron, proton, matter, and charge distributions, quadrupole deformations, and neutron and proton Fermi surfaces are tabulated and compared with available experimental data. A total of 4829 even-$Z$ nuclei are predicted to be bound, with an rms deviation of 1.477 MeV from the 1244 mass data. Good agreement with the available experimental odd-even mass differences, $α$ decay energies, and charge radii is also achieved. The description accuracy for nuclear masses and nucleon separation energies as well as the prediction for drip lines is compared with the results obtained from other relativistic and nonrelativistic density functional. The comparison shows that the DRHBc theory with PC-PK1 provides an excellent microscopic description for the masses of even-$Z$ nuclei. The systematics of the nucleon separation energies, odd-even mass differences, pairing energies, two-nucleon gaps, $α$ decay energies, rms radii, quadrupole deformations, potential energy curves, neutron density distributions, and neutron mean-field potentials are discussed.
△ Less
Submitted 10 June, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Almost global existence and radiative decay of three dimensional spherical gas bubble inside inviscid compressible liquid
Authors:
Liangchen Zou
Abstract:
The present paper considers the model of a homogeneous bubble inside an unbounded isentropic compressible inviscid liquid. The exterior liquid is governed by the Euler equation while the free bubble surface is determined by the kinematic and dynamic boundary conditions on the bubble-liquid interface. We first proved the local existence and uniqueness of the complete nonlinear system using energy m…
▽ More
The present paper considers the model of a homogeneous bubble inside an unbounded isentropic compressible inviscid liquid. The exterior liquid is governed by the Euler equation while the free bubble surface is determined by the kinematic and dynamic boundary conditions on the bubble-liquid interface. We first proved the local existence and uniqueness of the complete nonlinear system using energy methods under an iteration scheme. Then we proved the almost global existence of the solution and the radiative decay of bubble oscillation through a bootstrap argument. Except for the energy estimate, this bootstrap argument encompasses a generalized KSS (Keel-Smith-Sogge) estimate and the analysis of backward pressure wave using the method of characteristics, which are the novelty of the present paper.
We developed a generalized weighted $L^2_tH^j_x$-estimate, or the so-called KSS estimate, which extends the KSS estimate \cite{MR2015331} to nonlinear wave equations in exterior domains regardless of the boundary conditions, at the cost of only the appearance of a $L_t^2$ norm of the boundary value. To handle this boundary value, we establish a method of characteristics to study the backward pressure wave, which is then used to decouple the ODE of the boundary value from the hyperbolic system of backward and forward pressure wave. The analysis of backward pressure wave takes advantage of a change of variable between the backward and forward characteristics generated by the sound speed field in a geometric way. These two methods can not only be used for the bubble-liquid model studied in this paper, but are expected to be applied on other questions regarding nonlinear wave equations with complex boundary conditions.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Statistical Machine Learning Meets High-Dimensional Spatiotemporal Challenges -- A Case Study of COVID-19 Modeling
Authors:
Binbin Lin,
Yimin Dai,
Lei Zou,
Ning Ning
Abstract:
Diverse non-pharmacological interventions (NPIs), serving as the primary approach for COVID-19 control prior to pharmaceutical interventions, showed heterogeneous spatiotemporal effects on pandemic management. Investigating the dynamic compounding impacts of NPIs on pandemic spread is imperative. However, the challenges posed by data availability of high-dimensional human behaviors and the complex…
▽ More
Diverse non-pharmacological interventions (NPIs), serving as the primary approach for COVID-19 control prior to pharmaceutical interventions, showed heterogeneous spatiotemporal effects on pandemic management. Investigating the dynamic compounding impacts of NPIs on pandemic spread is imperative. However, the challenges posed by data availability of high-dimensional human behaviors and the complexity of modeling changing and interrelated factors are substantial. To address these challenges, this study analyzed social media data, COVID-19 case rates, Apple mobility data, and the stringency of stay-at-home policies in the United States throughout the year 2020, aiming to (1) uncover the spatiotemporal variations in NPIs during the COVID-19 pandemic utilizing geospatial big data; (2) develop a statistical machine learning model that incorporates spatiotemporal dependencies and temporal lag effects for the detection of relationships; (3) dissect the impacts of NPIs on the pandemic across space and time. Three indices were computed based on Twitter (currently known as X) data: the Negative and Positive Sentiments Adjusted by Demographics (N-SAD and P-SAD) and the Ratio Adjusted by Demographics (RAD), representing negative sentiment, positive sentiment, and public awareness of COVID-19, respectively. The Multivariate Bayesian Structural Time Series Time Lagged model (MBSTS-TL) was proposed to investigate the effects of NPIs, accounting for spatial dependencies and temporal lag effects. The developed MBSTS-TL model exhibited a high degree of accuracy. Determinants of COVID-19 health impacts transitioned from an emphasis on human mobility during the initial outbreak period to a combination of human mobility and stay-at-home policies during the rapid spread phase, and ultimately to the compound of human mobility, stay-at-home policies, and public awareness of COVID-19.
△ Less
Submitted 28 November, 2023;
originally announced December 2023.
-
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Authors:
Xueyao Zhang,
Liumeng Xue,
Yicheng Gu,
Yuancheng Wang,
Haorui He,
Chaoren Wang,
Xi Chen,
Zihao Fang,
Haopeng Chen,
Junan Zhang,
Tze Ying Tang,
Lexiao Zou,
Mingxuan Wang,
Jun Han,
Kai Chen,
Haizhou Li,
Zhizheng Wu
Abstract:
Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, a…
▽ More
Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, allowing both beginners and seasoned researchers to kick-start their projects with relative ease. Additionally, it provides interactive visualizations and demonstrations of classic models for educational purposes. The initial release of Amphion v0.1 supports a range of tasks including Text to Speech (TTS), Text to Audio (TTA), and Singing Voice Conversion (SVC), supplemented by essential components like data preprocessing, state-of-the-art vocoders, and evaluation metrics. This paper presents a high-level overview of Amphion.
△ Less
Submitted 22 February, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Identified charged-hadron production in $p$$+$Al, $^3$He$+$Au, and Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and in U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
J. Alexander,
M. Alfred,
V. Andrieux,
K. Aoki,
N. Apadula,
H. Asano,
E. T. Atomssa,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
X. Bai,
N. S. Bandara,
B. Bannier,
K. N. Barish,
S. Bathe,
V. Baublis
, et al. (456 additional authors not shown)
Abstract:
The PHENIX experiment has performed a systematic study of identified charged-hadron ($π^\pm$, $K^\pm$, $p$, $\bar{p}$) production at midrapidity in $p$$+$Al, $^3$He$+$Au, Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV. Identified charged-hadron invariant transverse-momentum ($p_T$) and transverse-mass ($m_T$) spectra are presented and interprete…
▽ More
The PHENIX experiment has performed a systematic study of identified charged-hadron ($π^\pm$, $K^\pm$, $p$, $\bar{p}$) production at midrapidity in $p$$+$Al, $^3$He$+$Au, Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV. Identified charged-hadron invariant transverse-momentum ($p_T$) and transverse-mass ($m_T$) spectra are presented and interpreted in terms of radially expanding thermalized systems. The particle ratios of $K/π$ and $p/π$ have been measured in different centrality ranges of large (Cu$+$Au, U$+$U) and small ($p$$+$Al, $^3$He$+$Au) collision systems. The values of $K/π$ ratios measured in all considered collision systems were found to be consistent with those measured in $p$$+$$p$ collisions. However the values of $p/π$ ratios measured in large collision systems reach the values of $\approx0.6$, which is $\approx2$ times larger than in $p$$+$$p$ collisions. These results can be qualitatively understood in terms of the baryon enhancement expected from hadronization by recombination. Identified charged-hadron nuclear-modification factors ($R_{AB}$) are also presented. Enhancement of proton $R_{AB}$ values over meson $R_{AB}$ values was observed in central $^3$He$+$Au, Cu$+$Au, and U$+$U collisions. The proton $R_{AB}$ values measured in $p$$+$Al collision system were found to be consistent with $R_{AB}$ values of $φ$, $π^\pm$, $K^\pm$, and $π^0$ mesons, which may indicate that the size of the system produced in $p$$+$Al collisions is too small for recombination to cause a noticeable increase in proton production.
△ Less
Submitted 22 May, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Multilingual large language models leak human stereotypes across language boundaries
Authors:
Yang Trista Cao,
Anna Sotnikova,
Jieyu Zhao,
Linda X. Zou,
Rachel Rudinger,
Hal Daume III
Abstract:
Multilingual large language models have been increasingly popular for their proficiency in processing and generating text across various languages. Previous research has shown that the presence of stereotypes and biases in monolingual large language models can be attributed to the nature of their training data, which is collected from humans and reflects societal biases. Multilingual language mode…
▽ More
Multilingual large language models have been increasingly popular for their proficiency in processing and generating text across various languages. Previous research has shown that the presence of stereotypes and biases in monolingual large language models can be attributed to the nature of their training data, which is collected from humans and reflects societal biases. Multilingual language models undergo the same training procedure as monolingual ones, albeit with training data sourced from various languages. This raises the question: do stereotypes present in one social context leak across languages within the model? In our work, we first define the term ``stereotype leakage'' and propose a framework for its measurement. With this framework, we investigate how stereotypical associations leak across four languages: English, Russian, Chinese, and Hindi. To quantify the stereotype leakage, we employ an approach from social psychology, measuring stereotypes via group-trait associations. We evaluate human stereotypes and stereotypical associations manifested in multilingual large language models such as mBERT, mT5, and GPT-3.5. Our findings show a noticeable leakage of positive, negative, and non-polar associations across all languages. Notably, Hindi within multilingual models appears to be the most susceptible to influence from other languages, while Chinese is the least. Additionally, GPT-3.5 exhibits a better alignment with human scores than other models. WARNING: This paper contains model outputs which could be offensive in nature.
△ Less
Submitted 8 May, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Regret Optimality of GP-UCB
Authors:
Wenjia Wang,
Xiaowei Zhang,
Lu Zou
Abstract:
Gaussian Process Upper Confidence Bound (GP-UCB) is one of the most popular methods for optimizing black-box functions with noisy observations, due to its simple structure and superior performance. Its empirical successes lead to a natural, yet unresolved question: Is GP-UCB regret optimal? In this paper, we offer the first generally affirmative answer to this important open question in the Bayesi…
▽ More
Gaussian Process Upper Confidence Bound (GP-UCB) is one of the most popular methods for optimizing black-box functions with noisy observations, due to its simple structure and superior performance. Its empirical successes lead to a natural, yet unresolved question: Is GP-UCB regret optimal? In this paper, we offer the first generally affirmative answer to this important open question in the Bayesian optimization literature. We establish new upper bounds on both the simple and cumulative regret of GP-UCB when the objective function to optimize admits certain smoothness property. These upper bounds match the known minimax lower bounds (up to logarithmic factors independent of the feasible region's dimensionality) for optimizing functions with the same smoothness. Intriguingly, our findings indicate that, with the same level of exploration, GP-UCB can simultaneously achieve optimality in both simple and cumulative regret. The crux of our analysis hinges on a refined uniform error bound for online estimation of functions in reproducing kernel Hilbert spaces. This error bound, which we derive from empirical process theory, is of independent interest, and its potential applications may reach beyond the scope of this study.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Electroweak Strings in the Standard Model
Authors:
Liping Zou,
Pengming Zhang,
Yongmin Cho
Abstract:
We argue that the existence of the electroweak monopole predicts the existence of the electroweak string in the standard model made of monopole-antimonopole pair separated infinitely apart, which carry the quantized magnetic flux $4 πn/e$. We show how to construct such quantized magnetic flux string solution. Our result strongly indicates that genuine fundamental electromagnetic string could exist…
▽ More
We argue that the existence of the electroweak monopole predicts the existence of the electroweak string in the standard model made of monopole-antimonopole pair separated infinitely apart, which carry the quantized magnetic flux $4 πn/e$. We show how to construct such quantized magnetic flux string solution. Our result strongly indicates that genuine fundamental electromagnetic string could exist in nature which could actually be detected. We discuss the physical implications of our result in cosmology.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
LLMaAA: Making Large Language Models as Active Annotators
Authors:
Ruoyu Zhang,
Yanzeng Li,
Yongliang Ma,
Ming Zhou,
Lei Zou
Abstract:
Prevalent supervised learning methods in natural language processing (NLP) are notoriously data-hungry, which demand large amounts of high-quality annotated data. In practice, acquiring such data is a costly endeavor. Recently, the superior few-shot performance of large language models (LLMs) has propelled the development of dataset generation, where the training data are solely synthesized from L…
▽ More
Prevalent supervised learning methods in natural language processing (NLP) are notoriously data-hungry, which demand large amounts of high-quality annotated data. In practice, acquiring such data is a costly endeavor. Recently, the superior few-shot performance of large language models (LLMs) has propelled the development of dataset generation, where the training data are solely synthesized from LLMs. However, such an approach usually suffers from low-quality issues, and requires orders of magnitude more labeled data to achieve satisfactory performance. To fully exploit the potential of LLMs and make use of massive unlabeled data, we propose LLMaAA, which takes LLMs as annotators and puts them into an active learning loop to determine what to annotate efficiently. To learn robustly with pseudo labels, we optimize both the annotation and training processes: (1) we draw k-NN examples from a small demonstration pool as in-context examples, and (2) we adopt the example reweighting technique to assign training samples with learnable weights. Compared with previous approaches, LLMaAA features both efficiency and reliability. We conduct experiments and analysis on two classic NLP tasks, named entity recognition and relation extraction. With LLMaAA, task-specific models trained from LLM-generated labels can outperform the teacher within only hundreds of annotated examples, which is much more cost-effective than other baselines.
△ Less
Submitted 31 October, 2023; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Symmetries and anomalies of Kitaev spin-$S$ models: Identifying symmetry-enforced exotic quantum matter
Authors:
Ruizhi Liu,
Ho Tat Lam,
Han Ma,
Liujun Zou
Abstract:
We analyze the internal symmetries and their anomalies in the Kitaev spin-$S$ models. Importantly, these models have a lattice version of a $\mathbb{Z}_2$ 1-form symmetry, denoted by $\mathbb{Z}_2^{[1]}$. There is also an ordinary 0-form $\mathbb{Z}_2^{(x)}\times\mathbb{Z}_2^{(y)}\times\mathbb{Z}_2^T$ symmetry, where $\mathbb{Z}_2^{(x)}\times\mathbb{Z}_2^{(y)}$ are $π$ spin rotations around two or…
▽ More
We analyze the internal symmetries and their anomalies in the Kitaev spin-$S$ models. Importantly, these models have a lattice version of a $\mathbb{Z}_2$ 1-form symmetry, denoted by $\mathbb{Z}_2^{[1]}$. There is also an ordinary 0-form $\mathbb{Z}_2^{(x)}\times\mathbb{Z}_2^{(y)}\times\mathbb{Z}_2^T$ symmetry, where $\mathbb{Z}_2^{(x)}\times\mathbb{Z}_2^{(y)}$ are $π$ spin rotations around two orthogonal axes, and $\mathbb{Z}_2^T$ is the time reversal symmetry. The anomalies associated with the full $\mathbb{Z}_2^{(x)}\times\mathbb{Z}_2^{(y)}\times\mathbb{Z}_2^T\times\mathbb{Z}_2^{[1]}$ symmetry are classified by $\mathbb{Z}_2^{17}$. We find that for $S\in\mathbb{Z}$ the model is anomaly-free, while for $S\in\mathbb{Z}+\frac{1}{2}$ there is an anomaly purely associated with the 1-form symmetry, but there is no anomaly purely associated with the ordinary symmetry or mixed anomaly between the 0-form and 1-form symmetries. The consequences of these symmetries and anomalies apply to not only the Kitaev spin-$S$ models, but also any of their perturbed versions, assuming that the perturbations are local and respect the symmetries. If these local perturbations are weak, generically these consequences still apply even if the perturbations break the 1-form symmetry. A notable consequence is that there should generically be a deconfined fermionic excitation carrying no fractional quantum number under the $\mathbb{Z}_2^{(x)}\times\mathbb{Z}_2^{(y)}\times\mathbb{Z}_2^T$ symmetry if $S\in\mathbb{Z}+\frac{1}{2}$, which implies symmetry-enforced exotic quantum matter. We also discuss the consequences for $S\in\mathbb{Z}$.
△ Less
Submitted 15 April, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
RDBench: ML Benchmark for Relational Databases
Authors:
Zizhao Zhang,
Yi Yang,
Lutong Zou,
He Wen,
Tao Feng,
Jiaxuan You
Abstract:
Benefiting from high-quality datasets and standardized evaluation metrics, machine learning (ML) has achieved sustained progress and widespread applications. However, while applying machine learning to relational databases (RDBs), the absence of a well-established benchmark remains a significant obstacle to the development of ML. To address this issue, we introduce ML Benchmark For Relational Data…
▽ More
Benefiting from high-quality datasets and standardized evaluation metrics, machine learning (ML) has achieved sustained progress and widespread applications. However, while applying machine learning to relational databases (RDBs), the absence of a well-established benchmark remains a significant obstacle to the development of ML. To address this issue, we introduce ML Benchmark For Relational Databases (RDBench), a standardized benchmark that aims to promote reproducible ML research on RDBs that include multiple tables. RDBench offers diverse RDB datasets of varying scales, domains, and relational structures, organized into 4 levels. Notably, to simplify the adoption of RDBench for diverse ML domains, for any given database, RDBench exposes three types of interfaces including tabular data, homogeneous graphs, and heterogeneous graphs, sharing the same underlying task definition. For the first time, RDBench enables meaningful comparisons between ML methods from diverse domains, ranging from XGBoost to Graph Neural Networks, under RDB prediction tasks. We design multiple classification and regression tasks for each RDB dataset and report averaged results over the same dataset, further enhancing the robustness of the experimental findings. RDBench is implemented with DBGym, a user-friendly platform for ML research and application on databases, enabling benchmarking new ML methods with RDBench at ease.
△ Less
Submitted 30 October, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Improved Convergence Rate of Nested Simulation with LSE on Sieve
Authors:
Ruoxue Liu,
Liang Ding,
Wenjia Wang,
Lu Zou
Abstract:
Nested simulation encompasses the estimation of functionals linked to conditional expectations through simulation techniques. In this paper, we treat conditional expectation as a function of the multidimensional conditioning variable and provide asymptotic analyses of general Least Squared Estimators on sieve, without imposing specific assumptions on the function's form. Our study explores scenari…
▽ More
Nested simulation encompasses the estimation of functionals linked to conditional expectations through simulation techniques. In this paper, we treat conditional expectation as a function of the multidimensional conditioning variable and provide asymptotic analyses of general Least Squared Estimators on sieve, without imposing specific assumptions on the function's form. Our study explores scenarios in which the convergence rate surpasses that of the standard Monte Carlo method and the one recently proposed based on kernel ridge regression. We also delve into the conditions that allow for achieving the best possible square root convergence rate among all methods. Numerical experiments are conducted to support our statements.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion
Authors:
Xueyao Zhang,
Yicheng Gu,
Haopeng Chen,
Zihao Fang,
Lexiao Zou,
Junan Zhang,
Liumeng Xue,
Jinchao Zhang,
Jie Zhou,
Zhizheng Wu
Abstract:
Singing Voice Conversion (SVC) is a technique that enables any singer to perform any song. To achieve this, it is essential to obtain speaker-agnostic representations from the source audio, which poses a significant challenge. A common solution involves utilizing a semantic-based audio pretrained model as a feature extractor. However, the degree to which the extracted features can meet the SVC req…
▽ More
Singing Voice Conversion (SVC) is a technique that enables any singer to perform any song. To achieve this, it is essential to obtain speaker-agnostic representations from the source audio, which poses a significant challenge. A common solution involves utilizing a semantic-based audio pretrained model as a feature extractor. However, the degree to which the extracted features can meet the SVC requirements remains an open question. This includes their capability to accurately model melody and lyrics, the speaker-independency of their underlying acoustic information, and their robustness for in-the-wild acoustic environments. In this study, we investigate the knowledge within classical semantic-based pretrained models in much detail. We discover that the knowledge of different models is diverse and can be complementary for SVC. To jointly utilize the diverse pretrained models with mismatched time resolutions, we propose an efficient ReTrans strategy to address the feature fusion problem. Based on the above, we design a Singing Voice Conversion framework based on Diverse Semantic-based Feature Fusion (DSFF-SVC). Experimental results demonstrate that DSFF-SVC can be generalized and improve various existing SVC models, particularly in challenging real-world conversion tasks.
△ Less
Submitted 27 May, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Explainable machine learning-based prediction model for diabetic nephropathy
Authors:
Jing-Mei Yin,
Yang Li,
Jun-Tang Xue,
Guo-Wei Zong,
Zhong-Ze Fang,
Lang Zou
Abstract:
The aim of this study is to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach. The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). We select the optimal 38 features through a Least absolute shrinkage and selection operator (LASS…
▽ More
The aim of this study is to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach. The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). We select the optimal 38 features through a Least absolute shrinkage and selection operator (LASSO) regression model and a 10-fold cross-validation. We compare four machine learning algorithms, including eXtreme Gradient Boosting (XGB), random forest, decision tree and logistic regression, by AUC-ROC curves, decision curves, calibration curves. We quantify feature importance and interaction effects in the optimal predictive model by Shapley Additive exPlanations (SHAP) method. The XGB model has the best performance to screen for DN with the highest AUC value of 0.966. The XGB model also gains more clinical net benefits than others and the fitting degree is better. In addition, there are significant interactions between serum metabolites and duration of diabetes. We develop a predictive model by XGB algorithm to screen for DN. C2, C5DC, Tyr, Ser, Met, C24, C4DC, and Cys have great contribution in the model, and can possibly be biomarkers for DN.
△ Less
Submitted 24 October, 2023; v1 submitted 27 September, 2023;
originally announced September 2023.
-
Classification of symmetry-enriched topological quantum spin liquids
Authors:
Weicheng Ye,
Liujun Zou
Abstract:
We present a systematic framework to classify symmetry-enriched topological quantum spin liquids in two spatial dimensions. This framework can deal with all topological quantum spin liquids, which may be either Abelian or non-Abelian, chiral or non-chiral. It can systematically treat a general symmetry, which may include both lattice symmetry and internal symmetry, may contain anti-unitary symmetr…
▽ More
We present a systematic framework to classify symmetry-enriched topological quantum spin liquids in two spatial dimensions. This framework can deal with all topological quantum spin liquids, which may be either Abelian or non-Abelian, chiral or non-chiral. It can systematically treat a general symmetry, which may include both lattice symmetry and internal symmetry, may contain anti-unitary symmetry, and may permute anyons. The framework applies to all types of lattices, and can systematically distinguish different lattice systems with the same symmetry group using their Lieb-Schultz-Mattis anomalies. We apply this framework to classify $U(1)_{2N}$ chiral states and non-Abelian Ising$^{(ν)}$ states enriched by a $p6\times SO(3)$ or $p4\times SO(3)$ symmetry, and $\mathbb{Z}_N$ topological orders and $U(1)_{2N}\times U(1)_{-2N}$ topological orders enriched by a $p6m\times SO(3)\times\mathbb{Z}_2^T$, $p4m\times SO(3)\times\mathbb{Z}_2^T$, $p6m\times\mathbb{Z}_2^T$ or $p4m\times\mathbb{Z}_2^T$ symmetry, where $p6$, $p4$, $p6m$ and $p4m$ are lattice symmetries, while $SO(3)$ and $\mathbb{Z}_2^T$ are spin rotation and time reversal symmetries, respectively. In particular, we identify symmetry-enriched topological quantum spin liquids that are not easily captured by the usual parton-mean-field approach, including examples with the familiar $\mathbb{Z}_2$ topological order.
△ Less
Submitted 5 June, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
A class of elliptic mixed boundary value problems with $(p,q)$-Laplacian: existence, comparison and optimal control
Authors:
Shengda Zeng,
Stanislaw Migorski,
Domingo A. Tarzia,
Lang Zou,
Van Thien Nguyen
Abstract:
The paper deals with two nonlinear elliptic equations with $(p,q)$-Laplacian and the Dirichlet-Neumann-Dirichlet (DND) boundary conditions, and Dirich\-let-Neu\-mann-Neumann (DNN) boundary conditions, respectively. Under mild hypotheses, we prove the unique weak solvability of the elliptic mixed boundary value problems. Then, a comparison and a monotonicity results for the solutions of elliptic mi…
▽ More
The paper deals with two nonlinear elliptic equations with $(p,q)$-Laplacian and the Dirichlet-Neumann-Dirichlet (DND) boundary conditions, and Dirich\-let-Neu\-mann-Neumann (DNN) boundary conditions, respectively. Under mild hypotheses, we prove the unique weak solvability of the elliptic mixed boundary value problems. Then, a comparison and a monotonicity results for the solutions of elliptic mixed boundary value problems are established. We examine a convergence result which shows that the solution of (DND) can be approached by the solution of (DNN). Moreover, two optimal control problems governed by (DND) and (DNN), respectively, are considered, and an existence result for optimal control problems is obtained. Finally, we provide a result on asymptotic behavior of optimal controls and system states, when a parameter tends to infinity.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Two is Better Than One: Answering Complex Questions by Multiple Knowledge Sources with Generalized Links
Authors:
Minhao Zhang,
Yongliang Ma,
Yanzeng Li,
Ruoyu Zhang,
Lei Zou,
Ming Zhou
Abstract:
Incorporating multiple knowledge sources is proven to be beneficial for answering complex factoid questions. To utilize multiple knowledge bases (KB), previous works merge all KBs into a single graph via entity alignment and reduce the problem to question-answering (QA) over the fused KB. In reality, various link relations between KBs might be adopted in QA over multi-KBs. In addition to the ident…
▽ More
Incorporating multiple knowledge sources is proven to be beneficial for answering complex factoid questions. To utilize multiple knowledge bases (KB), previous works merge all KBs into a single graph via entity alignment and reduce the problem to question-answering (QA) over the fused KB. In reality, various link relations between KBs might be adopted in QA over multi-KBs. In addition to the identity between the alignable entities (i.e. full link), unalignable entities expressing the different aspects or types of an abstract concept may also be treated identical in a question (i.e. partial link). Hence, the KB fusion in prior works fails to represent all types of links, restricting their ability to comprehend multi-KBs for QA. In this work, we formulate the novel Multi-KB-QA task that leverages the full and partial links among multiple KBs to derive correct answers, a benchmark with diversified link and query types is also constructed to efficiently evaluate Multi-KB-QA performance. Finally, we propose a method for Multi-KB-QA that encodes all link relations in the KB embedding to score and rank candidate answers. Experiments show that our method markedly surpasses conventional KB-QA systems in Multi-KB-QA, justifying the necessity of devising this task.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
ADMUS: A Progressive Question Answering Framework Adaptable to Multiple Knowledge Sources
Authors:
Yirui Zhan,
Yanzeng Li,
Minhao Zhang,
Lei Zou
Abstract:
With the introduction of deep learning models, semantic parsingbased knowledge base question answering (KBQA) systems have achieved high performance in handling complex questions. However, most existing approaches primarily focus on enhancing the model's effectiveness on individual benchmark datasets, disregarding the high costs of adapting the system to disparate datasets in real-world scenarios…
▽ More
With the introduction of deep learning models, semantic parsingbased knowledge base question answering (KBQA) systems have achieved high performance in handling complex questions. However, most existing approaches primarily focus on enhancing the model's effectiveness on individual benchmark datasets, disregarding the high costs of adapting the system to disparate datasets in real-world scenarios (e.g., multi-tenant platform). Therefore, we present ADMUS, a progressive knowledge base question answering framework designed to accommodate a wide variety of datasets, including multiple languages, diverse backbone knowledge bases, and disparate question answering datasets. To accomplish the purpose, we decouple the architecture of conventional KBQA systems and propose this dataset-independent framework. Our framework supports the seamless integration of new datasets with minimal effort, only requiring creating a dataset-related micro-service at a negligible cost. To enhance the usability of ADUMS, we design a progressive framework consisting of three stages, ranges from executing exact queries, generating approximate queries and retrieving open-domain knowledge referring from large language models. An online demonstration of ADUMS is available at: https://answer.gstore.cn/pc/index.html
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
A Weakly Supervised Segmentation Network Embedding Cross-scale Attention Guidance and Noise-sensitive Constraint for Detecting Tertiary Lymphoid Structures of Pancreatic Tumors
Authors:
Bingxue Wang,
Liwen Zou,
Jun Chen,
Yingying Cao,
Zhenghua Cai,
Yudong Qiu,
Liang Mao,
Zhongqiu Wang,
Jingya Chen,
Luying Gui,
Xiaoping Yang
Abstract:
The presence of tertiary lymphoid structures (TLSs) on pancreatic pathological images is an important prognostic indicator of pancreatic tumors. Therefore, TLSs detection on pancreatic pathological images plays a crucial role in diagnosis and treatment for patients with pancreatic tumors. However, fully supervised detection algorithms based on deep learning usually require a large number of manual…
▽ More
The presence of tertiary lymphoid structures (TLSs) on pancreatic pathological images is an important prognostic indicator of pancreatic tumors. Therefore, TLSs detection on pancreatic pathological images plays a crucial role in diagnosis and treatment for patients with pancreatic tumors. However, fully supervised detection algorithms based on deep learning usually require a large number of manual annotations, which is time-consuming and labor-intensive. In this paper, we aim to detect the TLSs in a manner of few-shot learning by proposing a weakly supervised segmentation network. We firstly obtain the lymphocyte density maps by combining a pretrained model for nuclei segmentation and a domain adversarial network for lymphocyte nuclei recognition. Then, we establish a cross-scale attention guidance mechanism by jointly learning the coarse-scale features from the original histopathology images and fine-scale features from our designed lymphocyte density attention. A noise-sensitive constraint is introduced by an embedding signed distance function loss in the training procedure to reduce tiny prediction errors. Experimental results on two collected datasets demonstrate that our proposed method significantly outperforms the state-of-the-art segmentation-based algorithms in terms of TLSs detection accuracy. Additionally, we apply our method to study the congruent relationship between the density of TLSs and peripancreatic vascular invasion and obtain some clinically statistical results.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Characteristics of the edge temperature ring oscillation during stationary improved confnement mode in EAST
Authors:
A. D. Liu,
X. L. Zou,
X. M. Zhong,
Y. T. Song,
M. K. Han,
Y. M. Duan,
H. Q. Liu,
T. B. Wang,
E. Z. Li,
L. Zhang,
X. Feng,
G. Zhuang,
EAST I-mode working group
Abstract:
I-mode is a natural ELMy-free regime with H-mode like improved energy confnement and L-mode like particle confnement, making it an attractive scenario for future tokamak based fusion reactors. A kind of low frequency oscillation was widely found and appeared to be unique in I-mode, with the frequency between stationary zonal flow and geodesic-acoustic mode (GAM) zonal flow. In EAST, 90 percent I-m…
▽ More
I-mode is a natural ELMy-free regime with H-mode like improved energy confnement and L-mode like particle confnement, making it an attractive scenario for future tokamak based fusion reactors. A kind of low frequency oscillation was widely found and appeared to be unique in I-mode, with the frequency between stationary zonal flow and geodesic-acoustic mode (GAM) zonal flow. In EAST, 90 percent I-mode shots have such mode, called edge temperature ring oscillation (ETRO). The mode probably plays an important role during I-mode development and sustainment, while investigations are needed to clarify the differences between ETRO and the similar mode named as low frequency edge oscillation (LFEO) in AUG and C-Mod, especially whether it is still GAM. In the paper, the ETRO characteristics in EAST were investigated in detail and most do not agree with GAM, including that 1) during L-I transition with edge Te and Ti both increasing, ETRO has a smaller frequency than GAM; 2) ETRO has distinct harmonics in various diagnostics; 3) The magnetic component of ETRO is dominated by m = 1 structure; 4) ETRO is accompanied by turbulence transition between electron-scale and ion-scale; 5) As I-mode approaching to H-mode, ETRO frequency would decrease rapidly with Te increasing. These features imply that ETRO is probably caused by the stationary zonal flow with fnite frequency. Moreover, other damping mechanisms need to be involved besides collision in the Imode edge region. It was found that modest fueling could decrease the ETRO intensity with the I-mode confnement sustaining, suggesting that supersonic molecular beam injection (SMBI) could be used as an effective tool to control ETRO.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Unconfounded Propensity Estimation for Unbiased Ranking
Authors:
Dan Luo,
Lixin Zou,
Qingyao Ai,
Zhiyu Chen,
Chenliang Li,
Dawei Yin,
Brian D. Davison
Abstract:
The goal of unbiased learning to rank (ULTR) is to leverage implicit user feedback for optimizing learning-to-rank systems. Among existing solutions, automatic ULTR algorithms that jointly learn user bias models (i.e., propensity models) with unbiased rankers have received a lot of attention due to their superior performance and low deployment cost in practice. Despite their theoretical soundness,…
▽ More
The goal of unbiased learning to rank (ULTR) is to leverage implicit user feedback for optimizing learning-to-rank systems. Among existing solutions, automatic ULTR algorithms that jointly learn user bias models (i.e., propensity models) with unbiased rankers have received a lot of attention due to their superior performance and low deployment cost in practice. Despite their theoretical soundness, the effectiveness is usually justified under a weak logging policy, where the ranking model can barely rank documents according to their relevance to the query. However, when the logging policy is strong, e.g., an industry-deployed ranking policy, the reported effectiveness cannot be reproduced. In this paper, we first investigate ULTR from a causal perspective and uncover a negative result: existing ULTR algorithms fail to address the issue of propensity overestimation caused by the query-document relevance confounder. Then, we propose a new learning objective based on backdoor adjustment and highlight its differences from conventional propensity models, which reveal the prevalence of propensity overestimation. On top of that, we introduce a novel propensity model called Logging-Policy-aware Propensity (LPP) model and its distinctive two-step optimization strategy, which allows for the joint learning of LPP and ranking models within the automatic ULTR framework, and actualize the unconfounded propensity estimation for ULTR. Extensive experiments on two benchmarks demonstrate the effectiveness and generalizability of the proposed method.
△ Less
Submitted 8 July, 2023; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Electronic instability in pressured black phosphorus under strong magnetic field
Authors:
Zhong-Yi Wang,
Da-Yong Liu,
Liang-Jian Zou
Abstract:
In this paper we have systematically studied the electronic instability of pressured black phosphorous (BP) under strong magnetic field. We first present an effective model Hamiltonian for pressured BP near the Lifshitz point. We show that when the magnetic field exceeds a certain critical value, the nodal-line semimetal (NLSM) state of BP with a small band overlap re-enters semiconductive phase b…
▽ More
In this paper we have systematically studied the electronic instability of pressured black phosphorous (BP) under strong magnetic field. We first present an effective model Hamiltonian for pressured BP near the Lifshitz point. We show that when the magnetic field exceeds a certain critical value, the nodal-line semimetal (NLSM) state of BP with a small band overlap re-enters semiconductive phase by re-opening a small gap. This results in a narrow-band semiconductor with a partially flat valence band edge. We show that above this critical magnetic field, two possible instabilities, i.e., charge density wave (CDW) phase or excitonic insulator (EI) phase, are predicted as the ground state for high and low doping concentrations, respectively. By comparing our results with the experiment, we suggest the field-induced instability observed in recent experiment as EI. Furthermore, we propose that the semimetallic BP under pressure with small band overlaps may provide a good platform to study the magneto-exciton insulators. Our findings bring the first insight into the electronic instability of topological NLSM in the quantum limit.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
Representing Additive Gaussian Processes by Sparse Matrices
Authors:
Lu Zou,
Haoyuan Chen,
Liang Ding
Abstract:
Among generalized additive models, additive Matérn Gaussian Processes (GPs) are one of the most popular for scalable high-dimensional problems. Thanks to their additive structure and stochastic differential equation representation, back-fitting-based algorithms can reduce the time complexity of computing the posterior mean from $O(n^3)$ to $O(n\log n)$ time where $n$ is the data size. However, gen…
▽ More
Among generalized additive models, additive Matérn Gaussian Processes (GPs) are one of the most popular for scalable high-dimensional problems. Thanks to their additive structure and stochastic differential equation representation, back-fitting-based algorithms can reduce the time complexity of computing the posterior mean from $O(n^3)$ to $O(n\log n)$ time where $n$ is the data size. However, generalizing these algorithms to efficiently compute the posterior variance and maximum log-likelihood remains an open problem. In this study, we demonstrate that for Additive Matérn GPs, not only the posterior mean, but also the posterior variance, log-likelihood, and gradient of these three functions can be represented by formulas involving only sparse matrices and sparse vectors. We show how to use these sparse formulas to generalize back-fitting-based algorithms to efficiently compute the posterior mean, posterior variance, log-likelihood, and gradient of these three functions for additive GPs, all in $O(n \log n)$ time. We apply our algorithms to Bayesian optimization and propose efficient algorithms for posterior updates, hyperparameters learning, and computations of the acquisition function and its gradient in Bayesian optimization. Given the posterior, our algorithms significantly reduce the time complexity of computing the acquisition function and its gradient from $O(n^2)$ to $O(\log n)$ for general learning rate, and even to $O(1)$ for small learning rate.
△ Less
Submitted 29 April, 2023;
originally announced May 2023.
-
Possible high-temperature magnetically topological material Mn$_{3}$Bi$_{2}$Te$_{6}$
Authors:
Wen-Feng Wu,
Han-Yu Wang,
Wei-Hua Wang,
Da-Yong Liu,
Xiang-Long Yu,
Zhi Zeng,
Liang-Jian Zou
Abstract:
The Mn-Bi-Te family displaying magnetism and non-trivial topological properties has received extensive attention. Here, we predict that the antiferromagnetic structure of Mn$_{3}$Bi$_{2}$Te$_{6}$ with three MnTe layers is energetically stable and the magnetic coupling strength of Mn-Mn is enhanced four times compared with that in the single MnTe layer of MnBi$_{2}$Te$_{4}$. The predicted Néel tran…
▽ More
The Mn-Bi-Te family displaying magnetism and non-trivial topological properties has received extensive attention. Here, we predict that the antiferromagnetic structure of Mn$_{3}$Bi$_{2}$Te$_{6}$ with three MnTe layers is energetically stable and the magnetic coupling strength of Mn-Mn is enhanced four times compared with that in the single MnTe layer of MnBi$_{2}$Te$_{4}$. The predicted Néel transition point is higher than 77 K, the liquid-nitrogen temperature. The topological properties show that with the variation of the MnTe layer from a single layer to three layers, the system transforms from a nontrivial topological phase to a trivial topological phase. Interestingly, the ferromagnetic state of Mn$_{3}$Bi$_{2}$Te$_{6}$ is a topological semimetal and it exhibits a topological transition from trivial to nontrivial induced by the magnetic transition. Our results enrich the Mn-Bi-Te family system, offer a new platform for studying topological phase transitions, and pave a new way to improve the working temperature of magnetically topological devices.
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
Efficient Execution of SPARQL Queries with OPTIONAL and UNION Expressions
Authors:
Lei Zou,
Yue Pang,
M. Tamer Özsu,
Jiaqi Chen
Abstract:
The proliferation of RDF datasets has resulted in studies focusing on optimizing SPARQL query processing. Most existing work focuses on basic graph patterns (BGPs) and ignores other vital operators in SPARQL, such as UNION and OPTIONAL. SPARQL queries with these operators, which we abbreviate as SPARQL-UO, pose serious query plan generation challenges. In this paper, we propose techniques for exec…
▽ More
The proliferation of RDF datasets has resulted in studies focusing on optimizing SPARQL query processing. Most existing work focuses on basic graph patterns (BGPs) and ignores other vital operators in SPARQL, such as UNION and OPTIONAL. SPARQL queries with these operators, which we abbreviate as SPARQL-UO, pose serious query plan generation challenges. In this paper, we propose techniques for executing SPARQL-UO queries using BGP execution as a building block, based on a novel BGP-based Evaluation (BE)-Tree representation of query plans. On top of this, we propose a series of cost-driven BE-tree transformations to generate more efficient plans by reducing the search space and intermediate result sizes, and a candidate pruning technique that further enhances efficiency at query time. Experiments confirm that our method outperforms the state-of-the-art by orders of magnitude.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Disentangling centrality bias and final-state effects in the production of high-$p_T$ $π^0$ using direct $γ$ in $d$$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
N. J. Abdulameer,
U. Acharya,
C. Aidala,
Y. Akiba,
M. Alfred,
K. Aoki,
N. Apadula,
C. Ayuso,
V. Babintsev,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
L. Bichon,
B. Blankenship,
D. S. Blau,
M. Boer,
J. S. Bok,
V. Borisov,
M. L. Brooks,
J. Bryslawskyj,
V. Bumazhnov,
C. Butler
, et al. (253 additional authors not shown)
Abstract:
PHENIX presents a simultaneous measurement of the production of direct $γ$ and $π^0$ in $d$$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV over a $p_T$ range of 7.5 to 18 GeV/$c$ for different event samples selected by event activity, i.e. charged-particle multiplicity detected at forward rapidity. Direct-photon yields are used to empirically estimate the contribution of hard-scattering processes i…
▽ More
PHENIX presents a simultaneous measurement of the production of direct $γ$ and $π^0$ in $d$$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV over a $p_T$ range of 7.5 to 18 GeV/$c$ for different event samples selected by event activity, i.e. charged-particle multiplicity detected at forward rapidity. Direct-photon yields are used to empirically estimate the contribution of hard-scattering processes in the different event samples. Using this estimate, the average nuclear-modification factor $R_{d\rm Au,EXP}^{γ^{\rm dir}}$ is $0.925{\pm}0.023({\rm stat}){\pm}0.15^{\rm (scale)}$, consistent with unity for minimum-bias (MB) $d$$+$Au events. For event classes with moderate event activity, $R_{d\rm Au,EXP}^{γ^{\rm dir}}$ is consistent with the MB value within 5\% uncertainty. These results confirm that the previously observed enhancement of high-$p_T$ $π^0$ production found in small-system collisions with low event activity is a result of a bias in interpreting event activity within the Glauber framework. In contrast, for the top 5\% of events with the highest event activity, $R_{d\rm Au,EXP}^{γ^{\rm dir}}$ is suppressed by 20\% relative to the MB value with a significance of $4.5σ$, which may be due to final-state effects.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.