-
Few-Shot Anomaly Detection via Category-Agnostic Registration Learning
Authors:
Chaoqin Huang,
Haoyan Guan,
Aofan Jiang,
Yanfeng Wang,
Michael Spratling,
Xinchao Wang,
Ya Zhang
Abstract:
Most existing anomaly detection methods require a dedicated model for each category. Such a paradigm, despite its promising results, is computationally expensive and inefficient, thereby failing to meet the requirements for real-world applications. Inspired by how humans detect anomalies, by comparing a query image to known normal ones, this paper proposes a novel few-shot anomaly detection (FSAD)…
▽ More
Most existing anomaly detection methods require a dedicated model for each category. Such a paradigm, despite its promising results, is computationally expensive and inefficient, thereby failing to meet the requirements for real-world applications. Inspired by how humans detect anomalies, by comparing a query image to known normal ones, this paper proposes a novel few-shot anomaly detection (FSAD) framework. Using a training set of normal images from various categories, registration, aiming to align normal images of the same categories, is leveraged as the proxy task for self-supervised category-agnostic representation learning. At test time, an image and its corresponding support set, consisting of a few normal images from the same category, are supplied, and anomalies are identified by comparing the registered features of the test image to its corresponding support image features. Such a setup enables the model to generalize to novel test categories. It is, to our best knowledge, the first FSAD method that requires no model fine-tuning for novel categories: enabling a single model to be applied to all categories. Extensive experiments demonstrate the effectiveness of the proposed method. Particularly, it improves the current state-of-the-art for FSAD by 11.3% and 8.3% on the MVTec and MPDD benchmarks, respectively. The source code is available at https://github.com/Haoyan-Guan/CAReg.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
A Novel Diamond-like Carbon based photocathode for PICOSEC Micromegas detectors
Authors:
X. Wang,
R. Aleksan,
Y. Angelis,
J. Bortfeldt,
F. Brunbauer,
M. Brunoldi,
E. Chatzianagnostou,
J. Datta,
K. Degmelt,
G. Fanourakis,
D. Fiorina,
K. J. Floethner,
M. Gallinaro,
F. Garcia,
I. Giomataris,
K. Gnanvo,
F. J. Iguaz,
D. Janssens,
A. Kallitsopoulou,
M. Kovacic,
B. Kross,
P. Legou,
M. Lisowska,
J. Liu,
I. Maniatis
, et al. (26 additional authors not shown)
Abstract:
The PICOSEC Micromegas (MM) detector is a precise timing gaseous detector based on a MM detector operating in a two-stage amplification mode and a Cherenkov radiator. Prototypes equipped with cesium iodide (CsI) photocathodes have shown promising time resolutions as precise as 24 picoseconds (ps) for Minimum Ionizing Particles. However, due to the high hygroscopicity and susceptibility to ion bomb…
▽ More
The PICOSEC Micromegas (MM) detector is a precise timing gaseous detector based on a MM detector operating in a two-stage amplification mode and a Cherenkov radiator. Prototypes equipped with cesium iodide (CsI) photocathodes have shown promising time resolutions as precise as 24 picoseconds (ps) for Minimum Ionizing Particles. However, due to the high hygroscopicity and susceptibility to ion bombardment of the CsI photocathodes, alternative photocathode materials are needed to improve the robustness of PICOSEC MM. Diamond-like Carbon (DLC) film have been introduced as a novel robust photocathode material, which have shown promising results. A batch of DLC photocathodes with different thicknesses were produced and evaluated using ultraviolet light. The quantum efficiency measurements indicate that the optimized thickness of the DLC photocathode is approximately 3 nm. Furthermore, DLC photocathodes show good resistance to ion bombardment in aging test compared to the CsI photocathode. Finally, a PICOSEC MM prototype equipped with DLC photocathodes was tested in muon beams. A time resolution of around 42 ps with a detection efficiency of 97% for 150 GeV/c muons were obtained. These results indicate the great potential of DLC as a photocathode for the PICOSEC MM detector.
△ Less
Submitted 25 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Authors:
Yi-Fan Zhang,
Qingsong Wen,
Chaoyou Fu,
Xue Wang,
Zhang Zhang,
Liang Wang,
Rong Jin
Abstract:
Seeing clearly with high resolution is a foundation of Large Multimodal Models (LMMs), which has been proven to be vital for visual perception and reasoning. Existing works usually employ a straightforward resolution upscaling method, where the image consists of global and local branches, with the latter being the sliced image patches but resized to the same resolution as the former. This means th…
▽ More
Seeing clearly with high resolution is a foundation of Large Multimodal Models (LMMs), which has been proven to be vital for visual perception and reasoning. Existing works usually employ a straightforward resolution upscaling method, where the image consists of global and local branches, with the latter being the sliced image patches but resized to the same resolution as the former. This means that higher resolution requires more local patches, resulting in exorbitant computational expenses, and meanwhile, the dominance of local image tokens may diminish the global context. In this paper, we dive into the problems and propose a new framework as well as an elaborate optimization strategy. Specifically, we extract contextual information from the global view using a mixture of adapters, based on the observation that different adapters excel at different tasks. With regard to local patches, learnable query embeddings are introduced to reduce image tokens, the most important tokens accounting for the user question will be further selected by a similarity-based selector. Our empirical results demonstrate a `less is more' pattern, where \textit{utilizing fewer but more informative local image tokens leads to improved performance}. Besides, a significant challenge lies in the training strategy, as simultaneous end-to-end training of the global mining block and local compression block does not yield optimal results. We thus advocate for an alternating training way, ensuring balanced learning between global and local aspects. Finally, we also introduce a challenging dataset with high requirements for image detail, enhancing the training of the local compression layer. The proposed method, termed LMM with Sophisticated Tasks, Local image compression, and Mixture of global Experts (SliME), achieves leading performance across various benchmarks with only 2 million training data.
△ Less
Submitted 13 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Authors:
Xuehai He,
Weixi Feng,
Kaizhi Zheng,
Yujie Lu,
Wanrong Zhu,
Jiachen Li,
Yue Fan,
Jianfeng Wang,
Linjie Li,
Zhengyuan Yang,
Kevin Lin,
William Yang Wang,
Lijuan Wang,
Xin Eric Wang
Abstract:
Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi…
▽ More
Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multimodal video understanding. MMWorld distinguishes itself from previous video understanding benchmarks with two unique advantages: (1) multi-discipline, covering various disciplines that often require domain expertise for comprehensive understanding; (2) multi-faceted reasoning, including explanation, counterfactual thinking, future prediction, etc. MMWorld consists of a human-annotated dataset to evaluate MLLMs with questions about the whole videos and a synthetic dataset to analyze MLLMs within a single modality of perception. Together, MMWorld encompasses 1,910 videos across seven broad disciplines and 69 subdisciplines, complete with 6,627 question-answer pairs and associated captions. The evaluation includes 2 proprietary and 10 open-source MLLMs, which struggle on MMWorld (e.g., GPT-4V performs the best with only 52.3\% accuracy), showing large room for improvement. Further ablation studies reveal other interesting findings such as models' different skill sets from humans. We hope MMWorld can serve as an essential step towards world model evaluation in videos.
△ Less
Submitted 13 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
Authors:
Yue Li,
Xinsheng Wang,
Li Zhang,
Lei Xie
Abstract:
Speaker Change Detection (SCD) is to identify boundaries among speakers in a conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task, a further investigation of self-supervised learning (SSL) features for SCD is conducted in this work. Specifically, an SCD model, named SCDNet, is proposed. With this model, various state-of-the-art SSL models, including Hubert, wav…
▽ More
Speaker Change Detection (SCD) is to identify boundaries among speakers in a conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task, a further investigation of self-supervised learning (SSL) features for SCD is conducted in this work. Specifically, an SCD model, named SCDNet, is proposed. With this model, various state-of-the-art SSL models, including Hubert, wav2vec 2.0, and WavLm are investigated. To discern the most potent layer of SSL models for SCD, a learnable weighting method is employed to analyze the effectiveness of intermediate representations. Additionally, a fine-tuning-based approach is also implemented to further compare the characteristics of SSL models in the SCD task. Furthermore, a contrastive learning method is proposed to mitigate the overfitting tendencies in the training of both the fine-tuning-based method and SCDNet. Experiments showcase the superiority of WavLm in the SCD task and also demonstrate the good design of SCDNet.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
ProTrain: Efficient LLM Training via Memory-Aware Techniques
Authors:
Hanmei Yang,
Jin Zhou,
Yao Fu,
Xiaoqun Wang,
Ramine Roane,
Hui Guan,
Tongping Liu
Abstract:
It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem, existing work exploits the combination of CPU and GPU for the training process, such as ZeRO-Offload. Such a technique largely democratizes billion-scale model training, making it possible to train with few consumer graphics cards. However, based on our observation, existing frameworks often provide coarse-g…
▽ More
It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem, existing work exploits the combination of CPU and GPU for the training process, such as ZeRO-Offload. Such a technique largely democratizes billion-scale model training, making it possible to train with few consumer graphics cards. However, based on our observation, existing frameworks often provide coarse-grained memory management and require experienced experts in configuration tuning, leading to suboptimal hardware utilization and performance. This paper proposes ProTrain, a novel training system that intelligently balances memory usage and performance by coordinating memory, computation, and IO. ProTrain achieves adaptive memory management through Chunk-Based Model State Management and Block-Wise Activation Management, guided by a Memory-Aware Runtime Profiler without user intervention. ProTrain does not change the training algorithm and thus does not compromise accuracy. Experiments show that ProTrain improves training throughput by 1.43$\times$ to 2.71$\times$ compared to the SOTA training systems.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization
Authors:
Fengxiao Tang,
Xiaonan Wang,
Xun Yuan,
Linfeng Luo,
Ming Zhao,
Nei Kato
Abstract:
Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based algorithms, struggle to cope with the dynamic heterogeneous networks (DHNs) environment. Moreover, current state-of-the-art distributed anomaly detection methods, which utilize specific machine learn…
▽ More
Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based algorithms, struggle to cope with the dynamic heterogeneous networks (DHNs) environment. Moreover, current state-of-the-art distributed anomaly detection methods, which utilize specific machine learning techniques, lack multi-scale adaptivity for heterogeneous device information, resulting in unsatisfactory diagnostic accuracy for DHNs. In this paper, we develop an LLM-assisted end-to-end intelligent network health management framework. The framework first proposes a Multi-Scale Semanticized Anomaly Detection Model (MSADM), incorporating semantic rule trees with an attention mechanism to address the multi-scale anomaly detection problem in DHNs. Secondly, a chain-of-thought-based large language model is embedded in downstream to adaptively analyze the fault detection results and produce an analysis report with detailed fault information and optimization strategies. Experimental results show that the accuracy of our proposed MSADM for heterogeneous network entity anomaly detection is as high as 91.31\%.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
S. Afanasiev,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
H. Al-Bataineh,
J. Alexander,
M. Alfred,
K. Aoki,
N. Apadula,
L. Aphecetche,
J. Asai,
H. Asano,
E. T. Atomssa,
R. Averbeck,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
G. Baksay,
L. Baksay,
A. Baldisseri
, et al. (510 additional authors not shown)
Abstract:
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs…
▽ More
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Light-induced fictitious magnetic fields for quantum storage in cold atomic ensembles
Authors:
Jianmin Wang,
Liang Dong,
Xingchang Wang,
Zihan Zhou,
Ying Zuo,
Georgios A. Siviloglou,
J. F. Chen
Abstract:
In this work, we have demonstrated that optically generated fictitious magnetic fields can be utilized to extend the lifetime of quantum memories in cold atomic ensembles. All the degrees of freedom of an AC Stark shift such as polarization, spatial profile, and temporal waveform can be readily controlled in a precise manner. Temporal fluctuations over several experimental cycles, and spatial inho…
▽ More
In this work, we have demonstrated that optically generated fictitious magnetic fields can be utilized to extend the lifetime of quantum memories in cold atomic ensembles. All the degrees of freedom of an AC Stark shift such as polarization, spatial profile, and temporal waveform can be readily controlled in a precise manner. Temporal fluctuations over several experimental cycles, and spatial inhomogeneities along a cold atomic gas have been compensated by an optical beam. The advantage of the use of fictitious magnetic fields for quantum storage stems from the speed and spatial precision that these fields can be synthesized. Our simple and versatile technique can find widespread application in coherent pulse and single-photon storage in any atomic species.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (636 additional authors not shown)
Abstract:
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur…
▽ More
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
CrowdEgress: A Multi-Agent Simulation Platform for Pedestrian Crowd
Authors:
Peng Wang,
Xiaoda Wang,
Peter Luh,
Neal Olderman,
Christian Wilkie,
Timo Korhonen,
Gregor Jäger
Abstract:
This article introduces a simulation platform to study complex crowd behavior in social context. The agent-based model is extended based on the well-known social force model, and it mainly describes how agents interact with each other, and also with surrounding facilities such as walls, doors and exits. The simulation platform is compatible to FDS+Evac, and the input data in FDS+Evac could be impo…
▽ More
This article introduces a simulation platform to study complex crowd behavior in social context. The agent-based model is extended based on the well-known social force model, and it mainly describes how agents interact with each other, and also with surrounding facilities such as walls, doors and exits. The simulation platform is compatible to FDS+Evac, and the input data in FDS+Evac could be imported into our simulation platform to create single-floor compartment geometry, and a flow solver is used to generate the roadmap towards exits. Most importantly, we plan to integrate advanced social and psychological theory into our simulation platform, especially investigating human behavior in emergency evacuation,such as pre-evacuation behavior, exit-selection activities, social group and herding effect and so forth.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
Authors:
Yi Lu,
Yuankun Xie,
Ruibo Fu,
Zhengqi Wen,
Jianhua Tao,
Zhiyong Wang,
Xin Qi,
Xuefei Liu,
Yongwei Li,
Yukun Liu,
Xiaopeng Wang,
Shuchen Shi
Abstract:
With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to…
▽ More
With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to-end generation process, skipping the final step of vocoder processing. This poses a significant challenge for current audio deepfake detection (ADD) models based on vocoder artifacts. To effectively detect LLM-based deepfake audio, we focus on the core of the generation process, the conversion from neural codec to waveform. We propose Codecfake dataset, which is generated by seven representative neural codec methods. Experiment results show that codec-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models on the Codecfake test set.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Inductive Global and Local Manifold Approximation and Projection
Authors:
Jungeum Kim,
Xiao Wang
Abstract:
Nonlinear dimensional reduction with the manifold assumption, often called manifold learning, has proven its usefulness in a wide range of high-dimensional data analysis. The significant impact of t-SNE and UMAP has catalyzed intense research interest, seeking further innovations toward visualizing not only the local but also the global structure information of the data. Moreover, there have been…
▽ More
Nonlinear dimensional reduction with the manifold assumption, often called manifold learning, has proven its usefulness in a wide range of high-dimensional data analysis. The significant impact of t-SNE and UMAP has catalyzed intense research interest, seeking further innovations toward visualizing not only the local but also the global structure information of the data. Moreover, there have been consistent efforts toward generalizable dimensional reduction that handles unseen data. In this paper, we first propose GLoMAP, a novel manifold learning method for dimensional reduction and high-dimensional data visualization. GLoMAP preserves locally and globally meaningful distance estimates and displays a progression from global to local formation during the course of optimization. Furthermore, we extend GLoMAP to its inductive version, iGLoMAP, which utilizes a deep neural network to map data to its lower-dimensional representation. This allows iGLoMAP to provide lower-dimensional embeddings for unseen points without needing to re-train the algorithm. iGLoMAP is also well-suited for mini-batch learning, enabling large-scale, accelerated gradient calculations. We have successfully applied both GLoMAP and iGLoMAP to the simulated and real-data settings, with competitive experiments against the state-of-the-art methods.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking
Authors:
Xiangyang Yang,
Dan Zeng,
Xucheng Wang,
You Wu,
Hengzhou Ye,
Shuiwang Li
Abstract:
Empowered by transformer-based models, visual tracking has advanced significantly. However, the slow speed of current trackers limits their applicability on devices with constrained computational resources. To address this challenge, we introduce ABTrack, an adaptive computation framework that adaptively bypassing transformer blocks for efficient visual tracking. The rationale behind ABTrack is ro…
▽ More
Empowered by transformer-based models, visual tracking has advanced significantly. However, the slow speed of current trackers limits their applicability on devices with constrained computational resources. To address this challenge, we introduce ABTrack, an adaptive computation framework that adaptively bypassing transformer blocks for efficient visual tracking. The rationale behind ABTrack is rooted in the observation that semantic features or relations do not uniformly impact the tracking task across all abstraction levels. Instead, this impact varies based on the characteristics of the target and the scene it occupies. Consequently, disregarding insignificant semantic features or relations at certain abstraction levels may not significantly affect the tracking accuracy. We propose a Bypass Decision Module (BDM) to determine if a transformer block should be bypassed, which adaptively simplifies the architecture of ViTs and thus speeds up the inference process. To counteract the time cost incurred by the BDMs and further enhance the efficiency of ViTs, we innovatively adapt a pruning technique to reduce the dimension of the latent representation of tokens in each transformer block. Extensive experiments on multiple tracking benchmarks validate the effectiveness and generality of the proposed method and show that it achieves state-of-the-art performance. Code is released at: \href{https://github.com/1HykhqV3rU/ABTrack}
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Efficient Neural Common Neighbor for Temporal Graph Link Prediction
Authors:
Xiaohui Zhang,
Yanbo Wang,
Xiyuan Wang,
Muhan Zhang
Abstract:
Temporal graphs are ubiquitous in real-world scenarios, such as social network, trade and transportation. Predicting dynamic links between nodes in a temporal graph is of vital importance. Traditional methods usually leverage the temporal neighborhood of interaction history to generate node embeddings first and then aggregate the source and target node embeddings to predict the link. However, such…
▽ More
Temporal graphs are ubiquitous in real-world scenarios, such as social network, trade and transportation. Predicting dynamic links between nodes in a temporal graph is of vital importance. Traditional methods usually leverage the temporal neighborhood of interaction history to generate node embeddings first and then aggregate the source and target node embeddings to predict the link. However, such methods focus on learning individual node representations, but overlook the pairwise representation learning nature of link prediction and fail to capture the important pairwise features of links such as common neighbors (CN). Motivated by the success of Neural Common Neighbor (NCN) for static graph link prediction, we propose TNCN, a temporal version of NCN for link prediction in temporal graphs. TNCN dynamically updates a temporal neighbor dictionary for each node, and utilizes multi-hop common neighbors between the source and target node to learn a more effective pairwise representation. We validate our model on five large-scale real-world datasets from the Temporal Graph Benchmark (TGB), and find that it achieves new state-of-the-art performance on three of them. Additionally, TNCN demonstrates excellent scalability on large datasets, outperforming popular GNN baselines by up to 6.4 times in speed. Our code is available at https: //github.com/GraphPKU/TNCN.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Event-Triggered Optimal Tracking Control for Strict-Feedback Nonlinear Systems With Non-Affine Nonlinear Faults
Authors:
Ling Wang,
Xin Wang,
Ziming Wang
Abstract:
This article studies the control ideas of the optimal backstepping technique, proposing an event-triggered optimal tracking control scheme for a class of strict-feedback nonlinear systems with non-affine and nonlinear faults. A simplified identifier-critic-actor framework is employed in the reinforcement learning algorithm to achieve optimal control. The identifier estimates the unknown dynamic fu…
▽ More
This article studies the control ideas of the optimal backstepping technique, proposing an event-triggered optimal tracking control scheme for a class of strict-feedback nonlinear systems with non-affine and nonlinear faults. A simplified identifier-critic-actor framework is employed in the reinforcement learning algorithm to achieve optimal control. The identifier estimates the unknown dynamic functions, the critic evaluates the system performance, and the actor implements control actions, enabling modeling and control of anonymous systems for achieving optimal control performance. In this paper, a simplified reinforcement learning algorithm is designed by deriving update rules from the negative gradient of a simple positive function related to the Hamilton-Jacobi-Bellman equation, and it also releases the stringent persistent excitation condition. Then, a fault-tolerant control method is developed by applying filtered signals for controller design. Additionally, to address communication resource reduction, an event-triggered mechanism is employed for designing the actual controller. Finally, the proposed scheme's feasibility is validated through theoretical analysis and simulation.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Toward Enhanced Reinforcement Learning-Based Resource Management via Digital Twin: Opportunities, Applications, and Challenges
Authors:
Nan Cheng,
Xiucheng Wang,
Zan Li,
Zhisheng Yin,
Tom Luan,
Xuemin Shen
Abstract:
This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL methods face several unified challenges when applied to physical networks, including limited exploration efficiency, slow convergence, poor long-term performance, and safety concerns during the exploration…
▽ More
This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL methods face several unified challenges when applied to physical networks, including limited exploration efficiency, slow convergence, poor long-term performance, and safety concerns during the exploration phase. To deal with the above challenges, a comprehensive DT-based framework is proposed to enhance the convergence speed and performance for unified RL-based resource management. The proposed framework provides safe action exploration, more accurate estimates of long-term returns, faster training convergence, higher convergence performance, and real-time adaptation to varying network conditions. Then, two case studies on ultra-reliable and low-latency communication (URLLC) services and multiple unmanned aerial vehicles (UAV) network are presented, demonstrating improvements of the proposed framework in performance, convergence speed, and training cost reduction both on traditional RL and neural network based Deep RL (DRL). Finally, the article identifies and explores some of the research challenges and open issues in this rapidly evolving field.
△ Less
Submitted 15 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Research on material identification of mobile phones falling to the ground
Authors:
Xuesong Wang
Abstract:
The failure mode of the phone falling has a lot to do with the ground material. At present, the research on ground material and mobile phone damage is generally carried out through experiments, which is extremely costly. This paper presents a method to identify the material of mobile phones falling on the ground. The method determines the material of the mobile phone falling to the ground accordin…
▽ More
The failure mode of the phone falling has a lot to do with the ground material. At present, the research on ground material and mobile phone damage is generally carried out through experiments, which is extremely costly. This paper presents a method to identify the material of mobile phones falling on the ground. The method determines the material of the mobile phone falling to the ground according to the data of the mobile phone accelerometer and can obtain the ground material of the mobile phone falling through a large number of user data. By analyzing the physical process of mobile phone falling, the accelerometer data interval which can reflect the characteristics of falling is reasonably intercepted. And the data features that can reflect the collision are extracted. Finally, based on the fully connected neural network, the method of determining the material of mobile phones falling on the ground is developed. The experimental results show that the method has a high identification rate, and the average identification rate for different ground materials is 96.75%. Instead of relying on auxiliary devices, the identification process only relies on the phone's built-in acceleration sensor. This paper fills the gap in the field of mobile phone falling ground material identification. It has a great contribution to the collection of user drop data, the analysis of the drop process, and the improvement of mobile phone design defects.
△ Less
Submitted 12 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio
Authors:
Lin Zhang,
Xin Wang,
Erica Cooper,
Mireia Diez,
Federico Landini,
Nicholas Evans,
Junichi Yamagishi
Abstract:
This paper defines Spoof Diarization as a novel task in the Partial Spoof (PS) scenario. It aims to determine what spoofed when, which includes not only locating spoof regions but also clustering them according to different spoofing methods. As a pioneering study in spoof diarization, we focus on defining the task, establishing evaluation metrics, and proposing a benchmark model, namely the Counte…
▽ More
This paper defines Spoof Diarization as a novel task in the Partial Spoof (PS) scenario. It aims to determine what spoofed when, which includes not only locating spoof regions but also clustering them according to different spoofing methods. As a pioneering study in spoof diarization, we focus on defining the task, establishing evaluation metrics, and proposing a benchmark model, namely the Countermeasure-Condition Clustering (3C) model. Utilizing this model, we first explore how to effectively train countermeasures to support spoof diarization using three labeling schemes. We then utilize spoof localization predictions to enhance the diarization performance. This first study reveals the high complexity of the task, even in restricted scenarios where only a single speaker per audio file and an oracle number of spoofing methods are considered. Our code is available at https://github.com/nii-yamagishilab/PartialSpoof.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Probing the Shock Breakout Signal of SN 2024ggi from the Transformation of Early Flash Spectroscopy
Authors:
Jujia Zhang,
Luc Dessart,
Xiaofeng Wang,
Qian Zhai,
Yi Yang,
Liping Li,
Han Lin,
Giorgio Valerin,
Yongzhi Cai,
Zhen Guo,
Lingzhi Wang,
Zeyi Zhao,
Zhenyu Wang,
Shengyu Yan
Abstract:
We present early-time, hour-to-day cadence spectroscopy of the nearby type II supernova (SN II) 2024ggi, which was discovered at a phase when the SN shock just emerged from the red-supergiant (RSG) progenitor star. Over the first few days after the first light, SN\,2024ggi exhibited prominent narrow emission lines formed through intense and persistent photoionization of the nearby circumstellar ma…
▽ More
We present early-time, hour-to-day cadence spectroscopy of the nearby type II supernova (SN II) 2024ggi, which was discovered at a phase when the SN shock just emerged from the red-supergiant (RSG) progenitor star. Over the first few days after the first light, SN\,2024ggi exhibited prominent narrow emission lines formed through intense and persistent photoionization of the nearby circumstellar material (CSM). In the first 63 hours, spectral lines of He, C, N, and O revealed a rapid rise in ionization, as a result of the progressive sweeping-up of the CSM by the shock. The duration of the IIn-like spectra indicates a dense and relatively confined CSM distribution extending up to $\sim\,4\,\times\,10^{14}$\,cm. Spectral modeling reveals a CSM mass loss rate at this region exceeding $5 \times\, 10^{-3}$\,\Msun\,yr$^{-1}$\,is required to reproduce low-ionization emissions, which dramatically exceeds that of an RSG. Analyzing H$α$ emission shift implies the velocity of the unshocked outer CSM to be between 20 and 40 \kms, matching the typical wind velocity of an RSG. The differences between the inner and outer layers of the CSM and an RSG progenitor highlight a complex mass loss history before the explosion of SN 2024ggi.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
DMS: Addressing Information Loss with More Steps for Pragmatic Adversarial Attacks
Authors:
Zhiyu Zhu,
Jiayu Zhang,
Xinyi Wang,
Zhibo Jin,
Huaming Chen
Abstract:
Despite the exceptional performance of deep neural networks (DNNs) across different domains, they are vulnerable to adversarial samples, in particular for tasks related to computer vision. Such vulnerability is further influenced by the digital container formats used in computers, where the discrete numerical values are commonly used for storing the pixel values. This paper examines how informatio…
▽ More
Despite the exceptional performance of deep neural networks (DNNs) across different domains, they are vulnerable to adversarial samples, in particular for tasks related to computer vision. Such vulnerability is further influenced by the digital container formats used in computers, where the discrete numerical values are commonly used for storing the pixel values. This paper examines how information loss in file formats impacts the effectiveness of adversarial attacks. Notably, we observe a pronounced hindrance to the adversarial attack performance due to the information loss of the non-integer pixel values. To address this issue, we explore to leverage the gradient information of the attack samples within the model to mitigate the information loss. We introduce the Do More Steps (DMS) algorithm, which hinges on two core techniques: gradient ascent-based \textit{adversarial integerization} (DMS-AI) and integrated gradients-based \textit{attribution selection} (DMS-AS). Our goal is to alleviate such lossy process to retain the attack performance when storing these adversarial samples digitally. In particular, DMS-AI integerizes the non-integer pixel values according to the gradient direction, and DMS-AS selects the non-integer pixels by comparing attribution results. We conduct thorough experiments to assess the effectiveness of our approach, including the implementations of the DMS-AI and DMS-AS on two large-scale datasets with various latest gradient-based attack methods. Our empirical findings conclusively demonstrate the superiority of our proposed DMS-AI and DMS-AS pixel integerization methods over the standardised methods, such as rounding, truncating and upper approaches, in maintaining attack integrity.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms
Authors:
Harsh Kumar,
Ruiwei Xiao,
Benjamin Lawson,
Ilya Musabirov,
Jiakai Shi,
Xinyuan Wang,
Huayin Luo,
Joseph Jay Williams,
Anna Rafferty,
John Stamper,
Michael Liut
Abstract:
Self-reflection on learning experiences constitutes a fundamental cognitive process, essential for the consolidation of knowledge and the enhancement of learning efficacy. However, traditional methods to facilitate reflection often face challenges in personalization, immediacy of feedback, engagement, and scalability. Integration of Large Language Models (LLMs) into the reflection process could mi…
▽ More
Self-reflection on learning experiences constitutes a fundamental cognitive process, essential for the consolidation of knowledge and the enhancement of learning efficacy. However, traditional methods to facilitate reflection often face challenges in personalization, immediacy of feedback, engagement, and scalability. Integration of Large Language Models (LLMs) into the reflection process could mitigate these limitations. In this paper, we conducted two randomized field experiments in undergraduate computer science courses to investigate the potential of LLMs to help students engage in post-lesson reflection. In the first experiment (N=145), students completed a take-home assignment with the support of an LLM assistant; half of these students were then provided access to an LLM designed to facilitate self-reflection. The results indicated that the students assigned to LLM-guided reflection reported increased self-confidence and performed better on a subsequent exam two weeks later than their peers in the control condition. In the second experiment (N=112), we evaluated the impact of LLM-guided self-reflection against other scalable reflection methods, such as questionnaire-based activities and review of key lecture slides, after assignment. Our findings suggest that the students in the questionnaire and LLM-based reflection groups performed equally well and better than those who were only exposed to lecture slides, according to their scores on a proctored exam two weeks later on the same subject matter. These results underscore the utility of LLM-guided reflection and questionnaire-based activities in improving learning outcomes. Our work highlights that focusing solely on the accuracy of LLMs can overlook their potential to enhance metacognitive skills through practices such as self-reflection. We discuss the implications of our research for the Edtech community.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Image Neural Field Diffusion Models
Authors:
Yinbo Chen,
Oliver Wang,
Richard Zhang,
Eli Shechtman,
Xiaolong Wang,
Michael Gharbi
Abstract:
Diffusion models have shown an impressive ability to model complex data distributions, with several key advantages over GANs, such as stable training, better coverage of the training distribution's modes, and the ability to solve inverse problems without extra training. However, most diffusion models learn the distribution of fixed-resolution images. We propose to learn the distribution of continu…
▽ More
Diffusion models have shown an impressive ability to model complex data distributions, with several key advantages over GANs, such as stable training, better coverage of the training distribution's modes, and the ability to solve inverse problems without extra training. However, most diffusion models learn the distribution of fixed-resolution images. We propose to learn the distribution of continuous images by training diffusion models on image neural fields, which can be rendered at any resolution, and show its advantages over fixed-resolution models. To achieve this, a key challenge is to obtain a latent space that represents photorealistic image neural fields. We propose a simple and effective method, inspired by several recent techniques but with key changes to make the image neural fields photorealistic. Our method can be used to convert existing latent diffusion autoencoders into image neural field autoencoders. We show that image neural field diffusion models can be trained using mixed-resolution image datasets, outperform fixed-resolution diffusion models followed by super-resolution models, and can solve inverse problems with conditions applied at different scales efficiently.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
VersiCode: Towards Version-controllable Code Generation
Authors:
Tongtong Wu,
Weigang Wu,
Xingyu Wang,
Kang Xu,
Suyu Ma,
Bo Jiang,
Ping Yang,
Zhenchang Xing,
Yuan-Fang Li,
Gholamreza Haffari
Abstract:
Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comp…
▽ More
Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comprehensive dataset designed to assess the ability of large language models to generate verifiable code for specific library versions. VersiCode encompasses 300 libraries across more than 2,000 versions spanning 9 years. We design two dedicated evaluation tasks: version-specific code completion (VSCC) and version-aware code editing (VACE). Comprehensive experiments are conducted to benchmark the performance of LLMs, revealing the challenging nature of these tasks and VersiCode, that even state-of-the-art LLMs struggle to generate version-correct code. This dataset, together with the proposed tasks, sheds light on LLMs' capabilities and limitations in handling version-specific code generation, and opens up an important new area of research for further investigation. The resources can be found at https://github.com/wutong8023/VersiCode.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
World Models with Hints of Large Language Models for Goal Achieving
Authors:
Zeyuan Liu,
Ziyu Huan,
Xiyao Wang,
Jiafei Lyu,
Jian Tao,
Xiu Li,
Furong Huang,
Huazhe Xu
Abstract:
Reinforcement learning struggles in the face of long-horizon tasks and sparse goals due to the difficulty in manual reward specification. While existing methods address this by adding intrinsic rewards, they may fail to provide meaningful guidance in long-horizon decision-making tasks with large state and action spaces, lacking purposeful exploration. Inspired by human cognition, we propose a new…
▽ More
Reinforcement learning struggles in the face of long-horizon tasks and sparse goals due to the difficulty in manual reward specification. While existing methods address this by adding intrinsic rewards, they may fail to provide meaningful guidance in long-horizon decision-making tasks with large state and action spaces, lacking purposeful exploration. Inspired by human cognition, we propose a new multi-modal model-based RL approach named Dreaming with Large Language Models (DLLM). DLLM integrates the proposed hinting subgoals from the LLMs into the model rollouts to encourage goal discovery and reaching in challenging tasks. By assigning higher intrinsic rewards to samples that align with the hints outlined by the language model during model rollouts, DLLM guides the agent toward meaningful and efficient exploration. Extensive experiments demonstrate that the DLLM outperforms recent methods in various challenging, sparse-reward environments such as HomeGrid, Crafter, and Minecraft by 27.7\%, 21.1\%, and 9.9\%, respectively.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
Authors:
X. Wang,
Siming Fu,
Qihan Huang,
Wanggui He,
Hao Jiang
Abstract:
Recent advancements in text-to-image generation models have dramatically enhanced the generation of photorealistic images from textual prompts, leading to an increased interest in personalized text-to-image applications, particularly in multi-subject scenarios. However, these advances are hindered by two main challenges: firstly, the need to accurately maintain the details of each referenced subje…
▽ More
Recent advancements in text-to-image generation models have dramatically enhanced the generation of photorealistic images from textual prompts, leading to an increased interest in personalized text-to-image applications, particularly in multi-subject scenarios. However, these advances are hindered by two main challenges: firstly, the need to accurately maintain the details of each referenced subject in accordance with the textual descriptions; and secondly, the difficulty in achieving a cohesive representation of multiple subjects in a single image without introducing inconsistencies. To address these concerns, our research introduces the MS-Diffusion framework for layout-guided zero-shot image personalization with multi-subjects. This innovative approach integrates grounding tokens with the feature resampler to maintain detail fidelity among subjects. With the layout guidance, MS-Diffusion further improves the cross-attention to adapt to the multi-subject inputs, ensuring that each subject condition acts on specific areas. The proposed multi-subject cross-attention orchestrates harmonious inter-subject compositions while preserving the control of texts. Comprehensive quantitative and qualitative experiments affirm that this method surpasses existing models in both image and text fidelity, promoting the development of personalized text-to-image generation.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology
Authors:
Huahui Yi,
Xiaofei Wang,
Kang Li,
Chao Li
Abstract:
Multimodal learning, integrating histology images and genomics, promises to enhance precision oncology with comprehensive views at microscopic and molecular levels. However, existing methods may not sufficiently model the shared or complementary information for more effective integration. In this study, we introduce a Unified Modeling Enhanced Multimodal Learning (UMEML) framework that employs a h…
▽ More
Multimodal learning, integrating histology images and genomics, promises to enhance precision oncology with comprehensive views at microscopic and molecular levels. However, existing methods may not sufficiently model the shared or complementary information for more effective integration. In this study, we introduce a Unified Modeling Enhanced Multimodal Learning (UMEML) framework that employs a hierarchical attention structure to effectively leverage shared and complementary features of both modalities of histology and genomics. Specifically, to mitigate unimodal bias from modality imbalance, we utilize a query-based cross-attention mechanism for prototype clustering in the pathology encoder. Our prototype assignment and modularity strategy are designed to align shared features and minimizes modality gaps. An additional registration mechanism with learnable tokens is introduced to enhance cross-modal feature integration and robustness in multimodal unified modeling. Our experiments demonstrate that our method surpasses previous state-of-the-art approaches in glioma diagnosis and prognosis tasks, underscoring its superiority in precision neuro-Oncology.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Determination method of binary fractions by the integrated spectrum
Authors:
F. Zhang,
L. Li,
Z. Han,
X. Wang
Abstract:
We need to resolve the individual stars for binary fraction determinations of stellar systems. Therefore, it is not possible to obtain the binary fractions for dense or distant stellar systems. % We proposed a method to determine the binary fraction of a dense or distant stellar system. The method is to first determine the binary fraction variation for any two adjacent regions and then add up thos…
▽ More
We need to resolve the individual stars for binary fraction determinations of stellar systems. Therefore, it is not possible to obtain the binary fractions for dense or distant stellar systems. % We proposed a method to determine the binary fraction of a dense or distant stellar system. The method is to first determine the binary fraction variation for any two adjacent regions and then add up those binary fraction variations along the radial direction to obtain the binary fraction for a stellar system. Binary fraction variation is derived by using ten binary fraction-sensitive spectral absorption feature indices (SAFIs) and the binary fraction variation calibrations in terms of these SAFIs. Using this method, we first presented the binary fraction variations for twenty-one Galactic globular clusters (GCs). By comparisons, we find that they agree well with the binary fractions based on the main-sequence fiducial line method by previous studies. This verifies that the above mentioned method is feasible. Next, we presented the binary fraction variations of thirteen Galactic GCs. We gave the relationships between binary fraction and various parameters, and found that binary fraction is negatively correlated with NHB and NRR, binary fraction of some studies is not strongly correlated with NBS, and the number of GCs with large binary fraction is greater at extreme blue horizontal branch population ratio. At last, if we want to obtain more accurate binary fraction, we suggest that the spectroscopic and photometric observations are conducted at an appropriate area interval for a stellar system.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Authors:
Zigeng Chen,
Xinyin Ma,
Gongfan Fang,
Zhenxiong Tan,
Xinchao Wang
Abstract:
Diffusion models have garnered significant interest from the community for their great generative ability across various applications. However, their typical multi-step sequential-denoising nature gives rise to high cumulative latency, thereby precluding the possibilities of parallel computation. To address this, we introduce AsyncDiff, a universal and plug-and-play acceleration scheme that enable…
▽ More
Diffusion models have garnered significant interest from the community for their great generative ability across various applications. However, their typical multi-step sequential-denoising nature gives rise to high cumulative latency, thereby precluding the possibilities of parallel computation. To address this, we introduce AsyncDiff, a universal and plug-and-play acceleration scheme that enables model parallelism across multiple devices. Our approach divides the cumbersome noise prediction model into multiple components, assigning each to a different device. To break the dependency chain between these components, it transforms the conventional sequential denoising into an asynchronous process by exploiting the high similarity between hidden states in consecutive diffusion steps. Consequently, each component is facilitated to compute in parallel on separate devices. The proposed strategy significantly reduces inference latency while minimally impacting the generative quality. Specifically, for the Stable Diffusion v2.1, AsyncDiff achieves a 2.7x speedup with negligible degradation and a 4.0x speedup with only a slight reduction of 0.38 in CLIP Score, on four NVIDIA A5000 GPUs. Our experiments also demonstrate that AsyncDiff can be readily applied to video diffusion models with encouraging performances. The code is available at https://github.com/czg1225/AsyncDiff.
△ Less
Submitted 27 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Revolutionizing Wireless Networks with Self-Supervised Learning: A Pathway to Intelligent Communications
Authors:
Zhixiang Yang,
Hongyang Du,
Dusit Niyato,
Xudong Wang,
Yu Zhou,
Lei Feng,
Fanqin Zhou,
Wenjing Li,
Xuesong Qiu
Abstract:
With the rapid proliferation of mobile devices and data, next-generation wireless communication systems face stringent requirements for ultra-low latency, ultra-high reliability, and massive connectivity. Traditional AI-driven wireless network designs, while promising, often suffer from limitations such as dependency on labeled data and poor generalization. To address these challenges, we present…
▽ More
With the rapid proliferation of mobile devices and data, next-generation wireless communication systems face stringent requirements for ultra-low latency, ultra-high reliability, and massive connectivity. Traditional AI-driven wireless network designs, while promising, often suffer from limitations such as dependency on labeled data and poor generalization. To address these challenges, we present an integration of self-supervised learning (SSL) into wireless networks. SSL leverages large volumes of unlabeled data to train models, enhancing scalability, adaptability, and generalization. This paper offers a comprehensive overview of SSL, categorizing its application scenarios in wireless network optimization and presenting a case study on its impact on semantic communication. Our findings highlight the potentials of SSL to significantly improve wireless network performance without extensive labeled data, paving the way for more intelligent and efficient communication systems.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Validating LLM-Generated Programs with Metamorphic Prompt Testing
Authors:
Xiaoyin Wang,
Dakai Zhu
Abstract:
The latest paradigm shift in software development brings in the innovation and automation afforded by Large Language Models (LLMs), showcased by Generative Pre-trained Transformer (GPT), which has shown remarkable capacity to generate code autonomously, significantly reducing the manual effort required for various programming tasks. Although, the potential benefits of LLM-generated code are vast,…
▽ More
The latest paradigm shift in software development brings in the innovation and automation afforded by Large Language Models (LLMs), showcased by Generative Pre-trained Transformer (GPT), which has shown remarkable capacity to generate code autonomously, significantly reducing the manual effort required for various programming tasks. Although, the potential benefits of LLM-generated code are vast, most notably in efficiency and rapid prototyping, as LLMs become increasingly integrated into the software development lifecycle and hence the supply chain, complex and multifaceted challenges arise as the code generated from these language models carry profound questions on quality and correctness. Research is required to comprehensively explore these critical concerns surrounding LLM-generated code.
In this paper, we propose a novel solution called metamorphic prompt testing to address these challenges. Our intuitive observation is that intrinsic consistency always exists among correct code pieces but may not exist among flawed code pieces, so we can detect flaws in the code by detecting inconsistencies. Therefore, we can vary a given prompt to multiple prompts with paraphrasing, and to ask the LLM to acquire multiple versions of generated code, so that we can validate whether the semantic relations still hold in the acquired code through cross-validation. Our evaluation on HumanEval shows that metamorphic prompt testing is able to detect 75 percent of the erroneous programs generated by GPT-4, with a false positive rate of 8.6 percent.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Prototypical Reward Network for Data-Efficient RLHF
Authors:
Jinghan Zhang,
Xiting Wang,
Yiqiao Jin,
Changyu Chen,
Xinhao Zhang,
Kunpeng Liu
Abstract:
The reward model for Reinforcement Learning from Human Feedback (RLHF) has proven effective in fine-tuning Large Language Models (LLMs). Notably, collecting human feedback for RLHF can be resource-intensive and lead to scalability issues for LLMs and complex tasks. Our proposed framework Proto-RM leverages prototypical networks to enhance reward models under limited human feedback. By enabling sta…
▽ More
The reward model for Reinforcement Learning from Human Feedback (RLHF) has proven effective in fine-tuning Large Language Models (LLMs). Notably, collecting human feedback for RLHF can be resource-intensive and lead to scalability issues for LLMs and complex tasks. Our proposed framework Proto-RM leverages prototypical networks to enhance reward models under limited human feedback. By enabling stable and reliable structural learning from fewer samples, Proto-RM significantly enhances LLMs' adaptability and accuracy in interpreting human preferences. Extensive experiments on various datasets demonstrate that Proto-RM significantly improves the performance of reward models and LLMs in human feedback tasks, achieving comparable and usually better results than traditional methods, while requiring significantly less data. in data-limited scenarios. This research offers a promising direction for enhancing the efficiency of reward models and optimizing the fine-tuning of language models under restricted feedback conditions.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
RAG-based Crowdsourcing Task Decomposition via Masked Contrastive Learning with Prompts
Authors:
Jing Yang,
Xiao Wang,
Yu Zhao,
Yuhang Liu,
Fei-Yue Wang
Abstract:
Crowdsourcing is a critical technology in social manufacturing, which leverages an extensive and boundless reservoir of human resources to handle a wide array of complex tasks. The successful execution of these complex tasks relies on task decomposition (TD) and allocation, with the former being a prerequisite for the latter. Recently, pre-trained language models (PLMs)-based methods have garnered…
▽ More
Crowdsourcing is a critical technology in social manufacturing, which leverages an extensive and boundless reservoir of human resources to handle a wide array of complex tasks. The successful execution of these complex tasks relies on task decomposition (TD) and allocation, with the former being a prerequisite for the latter. Recently, pre-trained language models (PLMs)-based methods have garnered significant attention. However, they are constrained to handling straightforward common-sense tasks due to their inherent restrictions involving limited and difficult-to-update knowledge as well as the presence of hallucinations. To address these issues, we propose a retrieval-augmented generation-based crowdsourcing framework that reimagines TD as event detection from the perspective of natural language understanding. However, the existing detection methods fail to distinguish differences between event types and always depend on heuristic rules and external semantic analyzing tools. Therefore, we present a Prompt-Based Contrastive learning framework for TD (PBCT), which incorporates a prompt-based trigger detector to overcome dependence. Additionally, trigger-attentive sentinel and masked contrastive learning are introduced to provide varying attention to trigger and contextual features according to different event types. Experiment results demonstrate the competitiveness of our method in both supervised and zero-shot detection. A case study on printed circuit board manufacturing is showcased to validate its adaptability to unknown professional domains.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
Authors:
Tianwen Wei,
Bo Zhu,
Liang Zhao,
Cheng Cheng,
Biye Li,
Weiwei Lü,
Peng Cheng,
Jianhao Zhang,
Xiaoyu Zhang,
Liang Zeng,
Xiaokun Wang,
Yutuan Ma,
Rui Hu,
Shuicheng Yan,
Han Fang,
Yahui Zhou
Abstract:
In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initi…
▽ More
In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initializations. Our findings suggest that the choice between these two approaches should consider both the performance of the existing dense checkpoints and the MoE training budget. We highlight two innovative techniques: gating logit normalization, which improves expert diversification, and adaptive auxiliary loss coefficients, allowing for layer-specific adjustment of auxiliary loss coefficients. Our experimental results validate the effectiveness of these methods. Leveraging these techniques and insights, we trained our upcycled Skywork-MoE on a condensed subset of our SkyPile corpus. The evaluation results demonstrate that our model delivers strong performance across a wide range of benchmarks.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Biomarker-Guided Adaptive Enrichment Design with Threshold Detection for Clinical Trials with Time-to-Event Outcome
Authors:
Kaiyuan Hua,
Hwanhee Hong,
Xiaofei Wang
Abstract:
Biomarker-guided designs are increasingly used to evaluate personalized treatments based on patients' biomarker status in Phase II and III clinical trials. With adaptive enrichment, these designs can improve the efficiency of evaluating the treatment effect in biomarker-positive patients by increasing their proportion in the randomized trial. While time-to-event outcomes are often used as the prim…
▽ More
Biomarker-guided designs are increasingly used to evaluate personalized treatments based on patients' biomarker status in Phase II and III clinical trials. With adaptive enrichment, these designs can improve the efficiency of evaluating the treatment effect in biomarker-positive patients by increasing their proportion in the randomized trial. While time-to-event outcomes are often used as the primary endpoint to measure treatment effects for a new therapy in severe diseases like cancer and cardiovascular diseases, there is limited research on biomarker-guided adaptive enrichment trials in this context. Such trials almost always adopt hazard ratio methods for statistical measurement of treatment effects. In contrast, restricted mean survival time (RMST) has gained popularity for analyzing time-to-event outcomes because it offers more straightforward interpretations of treatment effects and does not require the proportional hazard assumption. This paper proposes a two-stage biomarker-guided adaptive RMST design with threshold detection and patient enrichment. We develop sophisticated methods for identifying the optimal biomarker threshold, treatment effect estimators in the biomarker-positive subgroup, and approaches for type I error rate, power analysis, and sample size calculation. We present a numerical example of re-designing an oncology trial. An extensive simulation study is conducted to evaluate the performance of the proposed design.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
Authors:
Xuanyu Yi,
Zike Wu,
Qiuhong Shen,
Qingshan Xu,
Pan Zhou,
Joo-Hwee Lim,
Shuicheng Yan,
Xinchao Wang,
Hanwang Zhang
Abstract:
Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-vi…
▽ More
Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (e.g., Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity. With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1\times$ of the model size.
△ Less
Submitted 20 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Federated learning in food research
Authors:
Zuzanna Fendor,
Bas H. M. van der Velden,
Xinxin Wang,
Andrea Jr. Carnoli,
Osman Mutlu,
Ali Hürriyetoğlu
Abstract:
Research in the food domain is at times limited due to data sharing obstacles, such as data ownership, privacy requirements, and regulations. While important, these obstacles can restrict data-driven methods such as machine learning. Federated learning, the approach of training models on locally kept data and only sharing the learned parameters, is a potential technique to alleviate data sharing o…
▽ More
Research in the food domain is at times limited due to data sharing obstacles, such as data ownership, privacy requirements, and regulations. While important, these obstacles can restrict data-driven methods such as machine learning. Federated learning, the approach of training models on locally kept data and only sharing the learned parameters, is a potential technique to alleviate data sharing obstacles. This systematic review investigates the use of federated learning within the food domain, structures included papers in a federated learning framework, highlights knowledge gaps, and discusses potential applications. A total of 41 papers were included in the review. The current applications include solutions to water and milk quality assessment, cybersecurity of water processing, pesticide residue risk analysis, weed detection, and fraud detection, focusing on centralized horizontal federated learning. One of the gaps found was the lack of vertical or transfer federated learning and decentralized architectures.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
A Survey on Incomplete Multi-label Learning: Recent Advances and Future Trends
Authors:
Xiang Li,
Jiexi Liu,
Xinrui Wang,
Songcan Chen
Abstract:
In reality, data often exhibit associations with multiple labels, making multi-label learning (MLL) become a prominent research topic. The last two decades have witnessed the success of MLL, which is indispensable from complete and accurate supervised information. However, obtaining such information in practice is always laborious and sometimes even impossible. To circumvent this dilemma, incomple…
▽ More
In reality, data often exhibit associations with multiple labels, making multi-label learning (MLL) become a prominent research topic. The last two decades have witnessed the success of MLL, which is indispensable from complete and accurate supervised information. However, obtaining such information in practice is always laborious and sometimes even impossible. To circumvent this dilemma, incomplete multi-label learning (InMLL) has emerged, aiming to learn from incomplete labeled data. To date, enormous InMLL works have been proposed to narrow the performance gap with complete MLL, whereas a systematic review for InMLL is still absent. In this paper, we not only attempt to fill the lacuna but also strive to pave the way for innovative research. Specifically, we retrospect the origin of InMLL, analyze the challenges of InMLL, and make a taxonomy of InMLL from the data-oriented and algorithm-oriented perspectives, respectively. Besides, we also present real applications of InMLL in various domains. More importantly, we highlight several potential future trends, including four open problems that are more in line with practice and three under-explored/unexplored techniques in addressing the challenges of InMLL, which may shed new light on developing novel research directions in the field of InMLL.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea…
▽ More
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Recurrent Context Compression: Efficiently Expanding the Context Window of LLM
Authors:
Chensen Huang,
Guibo Zhu,
Xuepeng Wang,
Yifei Luo,
Guojing Ge,
Haoran Chen,
Dong Yi,
Jinqiao Wang
Abstract:
To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also invest…
▽ More
To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also investigate the issue of poor model responses when both instructions and context are compressed in downstream tasks, and propose an instruction reconstruction method to mitigate this problem. We validated the effectiveness of our approach on multiple tasks, achieving a compression rate of up to 32x on text reconstruction tasks with a BLEU4 score close to 0.95, and nearly 100\% accuracy on a passkey retrieval task with a sequence length of 1M. Finally, our method demonstrated competitive performance in long-text question-answering tasks compared to non-compressed methods, while significantly saving storage resources in long-text inference tasks. Our code, models, and demo are available at https://github.com/WUHU-G/RCC_Transformer
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Efficient algorithm for the oscillatory matrix functions
Authors:
Dongping Li,
Xue Wang,
Xiuying Zhang
Abstract:
This paper introduces an efficient algorithm for computing the general oscillatory matrix functions. These computations are crucial for solving second-order semi-linear initial value problems. The method is exploited using the scaling and restoring technique based on a quadruple angle formula in conjunction with a truncated Taylor series. The choice of the scaling parameter and the degree of the T…
▽ More
This paper introduces an efficient algorithm for computing the general oscillatory matrix functions. These computations are crucial for solving second-order semi-linear initial value problems. The method is exploited using the scaling and restoring technique based on a quadruple angle formula in conjunction with a truncated Taylor series. The choice of the scaling parameter and the degree of the Taylor polynomial relies on a forward error analysis. Numerical experiments show that the new algorithm behaves in a stable fashion and performs well in both accuracy and efficiency.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Authors:
Peng Xia,
Ze Chen,
Juanxi Tian,
Yangrui Gong,
Ruibo Hou,
Yue Xu,
Zhenbang Wu,
Zhiyuan Fan,
Yiyang Zhou,
Kangyu Zhu,
Wenhao Zheng,
Zhaoyang Wang,
Xiao Wang,
Xuchao Zhang,
Chetan Bansal,
Marc Niethammer,
Junzhou Huang,
Hongtu Zhu,
Yun Li,
Jimeng Sun,
Zongyuan Ge,
Gang Li,
James Zou,
Huaxiu Yao
Abstract:
Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehen…
▽ More
Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https://github.com/richard-peng-xia/CARES.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning
Authors:
Xin Wang,
Zhiyun Song,
Yitao Zhu,
Sheng Wang,
Lichi Zhang,
Dinggang Shen,
Qian Wang
Abstract:
In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated.…
▽ More
In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated. However, most current solutions require a substantial number of paired high-resolution and low-resolution images for supervised training, which are typically unavailable in real-world scenarios. In this work, we propose a self-supervised super-resolution framework for inter-slice super-resolution of MR images. Our framework is first featured by pre-training on video dataset, as temporal correlation of videos is found beneficial for modeling the spatial relation among MR slices. Then, we use public high-quality MR dataset to fine-tune our pre-trained model, for enhancing awareness of our model to medical data. Finally, given a target dataset at hand, we utilize self-supervised fine-tuning to further ensure our model works well with user-specific super-resolution tasks. The proposed method demonstrates superior performance compared to other self-supervised methods and also holds the potential to benefit various downstream applications.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Expressive Power of Graph Neural Networks for (Mixed-Integer) Quadratic Programs
Authors:
Ziang Chen,
Xiaohan Chen,
Jialin Liu,
Xinshang Wang,
Wotao Yin
Abstract:
Quadratic programming (QP) is the most widely applied category of problems in nonlinear programming. Many applications require real-time/fast solutions, though not necessarily with high precision. Existing methods either involve matrix decomposition or use the preconditioned conjugate gradient method. For relatively large instances, these methods cannot achieve the real-time requirement unless the…
▽ More
Quadratic programming (QP) is the most widely applied category of problems in nonlinear programming. Many applications require real-time/fast solutions, though not necessarily with high precision. Existing methods either involve matrix decomposition or use the preconditioned conjugate gradient method. For relatively large instances, these methods cannot achieve the real-time requirement unless there is an effective precondition.
Recently, graph neural networks (GNNs) opened new possibilities for QP. Some promising empirical studies of applying GNNs for QP tasks show that GNNs can capture key characteristics of an optimization instance and provide adaptive guidance accordingly to crucial configurations during the solving process, or directly provide an approximate solution. Despite notable empirical observations, theoretical foundations are still lacking. In this work, we investigate the expressive or representative power of GNNs, a crucial aspect of neural network theory, specifically in the context of QP tasks, with both continuous and mixed-integer settings. We prove the existence of message-passing GNNs that can reliably represent key properties of quadratic programs, including feasibility, optimal objective value, and optimal solution. Our theory is validated by numerical results.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
Authors:
Hao Li,
Chenghao Yang,
An Zhang,
Yang Deng,
Xiang Wang,
Tat-Seng Chua
Abstract:
Open-domain dialogue systems have seen remarkable advancements with the development of large language models (LLMs). Nonetheless, most existing dialogue systems predominantly focus on brief single-session interactions, neglecting the real-world demands for long-term companionship and personalized interactions with chatbots. Crucial to addressing this real-world need are event summary and persona m…
▽ More
Open-domain dialogue systems have seen remarkable advancements with the development of large language models (LLMs). Nonetheless, most existing dialogue systems predominantly focus on brief single-session interactions, neglecting the real-world demands for long-term companionship and personalized interactions with chatbots. Crucial to addressing this real-world need are event summary and persona management, which enable reasoning for appropriate long-term dialogue responses. Recent progress in the human-like cognitive and reasoning capabilities of LLMs suggests that LLM-based agents could significantly enhance automated perception, decision-making, and problem-solving. In response to this potential, we introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent), which incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation. For the event memory module, long and short-term memory banks are employed to separately focus on historical and ongoing sessions, while a topic-based retrieval mechanism is introduced to enhance the accuracy of memory retrieval. Furthermore, the persona module conducts dynamic persona modeling for both users and agents. The integration of retrieved memories and extracted personas is subsequently fed into the generator to induce appropriate responses. The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated across various illustrative benchmarks, models, and tasks. The code is released at https://github.com/leolee99/LD-Agent.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
TTM-RE: Memory-Augmented Document-Level Relation Extraction
Authors:
Chufan Gao,
Xuan Wang,
Jimeng Sun
Abstract:
Document-level relation extraction aims to categorize the association between any two entities within a document. We find that previous methods for document-level relation extraction are ineffective in exploiting the full potential of large amounts of training data with varied noise levels. For example, in the ReDocRED benchmark dataset, state-of-the-art methods trained on the large-scale, lower-q…
▽ More
Document-level relation extraction aims to categorize the association between any two entities within a document. We find that previous methods for document-level relation extraction are ineffective in exploiting the full potential of large amounts of training data with varied noise levels. For example, in the ReDocRED benchmark dataset, state-of-the-art methods trained on the large-scale, lower-quality, distantly supervised training data generally do not perform better than those trained solely on the smaller, high-quality, human-annotated training data. To unlock the full potential of large-scale noisy training data for document-level relation extraction, we propose TTM-RE, a novel approach that integrates a trainable memory module, known as the Token Turing Machine, with a noisy-robust loss function that accounts for the positive-unlabeled setting. Extensive experiments on ReDocRED, a benchmark dataset for document-level relation extraction, reveal that TTM-RE achieves state-of-the-art performance (with an absolute F1 score improvement of over 3%). Ablation studies further illustrate the superiority of TTM-RE in other domains (the ChemDisGene dataset in the biomedical domain) and under highly unlabeled settings.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Async Learned User Embeddings for Ads Delivery Optimization
Authors:
Mingwei Tang,
Meng Liu,
Hong Li,
Junjie Yang,
Chenglin Wei,
Boyang Li,
Dai Li,
Rengan Xu,
Yifan Xu,
Zehua Zhang,
Xiangyu Wang,
Linfeng Liu,
Yuelei Xie,
Chengye Liu,
Labib Fawaz,
Li Li,
Hongnan Wang,
Bill Zhu,
Sri Reddy
Abstract:
In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based mul…
▽ More
In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based multimodal user activities through a Transformer-like large scale feature learning module. The async learned user representations embeddings (ALURE) are further converted to user similarity graphs through graph learning and then combined with user realtime activities to retrieval highly related ads candidates for the ads delivery system. Our method shows significant gains in both offline and online experiments.
△ Less
Submitted 23 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Hybrid terahertz emitter for pulse shaping and chirality control
Authors:
Weipeng Wu,
Wilder Acuna,
Zhixiang Huang,
Xi Wang,
Lars Gundlach,
Matthew F. Doty,
Joshua M. O. Zide,
M. Benjamin Jungfleisch
Abstract:
Terahertz (THz) radiation, spanning from 0.3 to 3x10^12 Hz, fills the crucial gap between the microwave and infrared spectral range. THz technology has found applications in various fields, from imaging and sensing to telecommunication and biosensing. However, the full potential of these applications is often hindered by the need for precise control and manipulation of the frequency and polarizati…
▽ More
Terahertz (THz) radiation, spanning from 0.3 to 3x10^12 Hz, fills the crucial gap between the microwave and infrared spectral range. THz technology has found applications in various fields, from imaging and sensing to telecommunication and biosensing. However, the full potential of these applications is often hindered by the need for precise control and manipulation of the frequency and polarization state, which typically requires external THz modulators. Here, we demonstrate a hybrid THz source that overcomes this limitation. Our device consists of two THz emitters integrated into one single device, enabling pulse shaping and chirality control of the emitted radiation without additional external components. The two sources are a spintronic emitter and a semiconductor photoconductive antenna (PCA). Using a combination of dual-wavelength excitation, allowing for control of the relative time delay between the two laser excitation pulses, and tuning external parameters for each emitter (i.e., biasing voltage for the PCA and magnetic field for the spintronic THz emitter) enables precise control of the mixing of the two signals and results in frequency, polarization, and chirality control of the overall THz radiation. This on-chip hybrid emitter provides an essential platform for engineered THz radiation with wide-ranging potential applications.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,…
▽ More
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.