-
Adam: Dense Retrieval Distillation with Adaptive Dark Examples
Authors:
Chongyang Tao,
Chang Liu,
Tao Shen,
Can Xu,
Xiubo Geng,
Binxing Jiao,
Daxin Jiang
Abstract:
To improve the performance of the dual-encoder retriever, one effective approach is knowledge distillation from the cross-encoder ranker. Existing works construct the candidate passages following the supervised learning setting where a query is paired with a positive passage and a batch of negatives. However, through empirical observation, we find that even the hard negatives from advanced methods…
▽ More
To improve the performance of the dual-encoder retriever, one effective approach is knowledge distillation from the cross-encoder ranker. Existing works construct the candidate passages following the supervised learning setting where a query is paired with a positive passage and a batch of negatives. However, through empirical observation, we find that even the hard negatives from advanced methods are still too trivial for the teacher to distinguish, preventing the teacher from transferring abundant dark knowledge to the student through its soft label. To alleviate this issue, we propose ADAM, a knowledge distillation framework that can better transfer the dark knowledge held in the teacher with Adaptive Dark exAMples. Different from previous works that only rely on one positive and hard negatives as candidate passages, we create dark examples that all have moderate relevance to the query through mixing-up and masking in discrete space. Furthermore, as the quality of knowledge held in different training instances varies as measured by the teacher's confidence score, we propose a self-paced distillation strategy that adaptively concentrates on a subset of high-quality instances to conduct our dark-example-based knowledge distillation to help the student learn better. We conduct experiments on two widely-used benchmarks and verify the effectiveness of our method.
△ Less
Submitted 6 June, 2024; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Power Consumption Modeling of 5G Multi-Carrier Base Stations: A Machine Learning Approach
Authors:
Nicola Piovesan,
David Lopez-Perez,
Antonio De Domenico,
Xinli Geng,
Harvey Bao
Abstract:
The fifth generation of the Radio Access Network (RAN) has brought new services, technologies, and paradigms with the corresponding societal benefits. However, the energy consumption of 5G networks is today a concern. In recent years, the design of new methods for decreasing the RAN power consumption has attracted interest from both the research community and standardization bodies, and many energ…
▽ More
The fifth generation of the Radio Access Network (RAN) has brought new services, technologies, and paradigms with the corresponding societal benefits. However, the energy consumption of 5G networks is today a concern. In recent years, the design of new methods for decreasing the RAN power consumption has attracted interest from both the research community and standardization bodies, and many energy savings solutions have been proposed. However, there is still a need to understand the power consumption behavior of state-ofthe-art base station architectures, such as multi-carrier active antenna units (AAUs), as well as the impact of different network parameters. In this paper, we present a power consumption model for 5G AAUs based on artificial neural networks. We demonstrate that this model achieves good estimation performance, and it is able to capture the benefits of energy saving when dealing with the complexity of multi-carrier base stations architectures. Importantly, multiple experiments are carried out to show the advantage of designing a general model able to capture the power consumption behaviors of different types of AAUs. Finally, we provide an analysis of the model scalability and the training data requirements.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
CoP: Factual Inconsistency Detection by Controlling the Preference
Authors:
Shuaijie She,
Xiang Geng,
Shujian Huang,
Jiajun Chen
Abstract:
Abstractive summarization is the process of generating a summary given a document as input. Although significant progress has been made, the factual inconsistency between the document and the generated summary still limits its practical applications. Previous work found that the probabilities assigned by the generation model reflect its preferences for the generated summary, including the preferen…
▽ More
Abstractive summarization is the process of generating a summary given a document as input. Although significant progress has been made, the factual inconsistency between the document and the generated summary still limits its practical applications. Previous work found that the probabilities assigned by the generation model reflect its preferences for the generated summary, including the preference for factual consistency, and the preference for the language or knowledge prior as well. To separate the preference for factual consistency, we propose an unsupervised framework named CoP by controlling the preference of the generation model with the help of prompt. More specifically, the framework performs an extra inference step in which a text prompt is introduced as an additional input. In this way, another preference is described by the generation probability of this extra inference process. The difference between the above two preferences, i.e. the difference between the probabilities, could be used as measurements for detecting factual inconsistencies. Interestingly, we found that with the properly designed prompt, our framework could evaluate specific preferences and serve as measurements for fine-grained categories of inconsistency, such as entity-related inconsistency, coreference-related inconsistency, etc. Moreover, our framework could also be extended to the supervised setting to learn better prompt from the labeled data as well. Experiments show that our framework achieves new SOTA results on three factual inconsistency detection tasks.
△ Less
Submitted 30 March, 2023; v1 submitted 3 December, 2022;
originally announced December 2022.
-
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes
Authors:
Aviral Kumar,
Rishabh Agarwal,
Xinyang Geng,
George Tucker,
Sergey Levine
Abstract:
The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP. However, recent works argue that offline RL methods encounter unique challenges to scaling up model capacity. Drawing on the learnings from these works, we re-examine previous design…
▽ More
The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP. However, recent works argue that offline RL methods encounter unique challenges to scaling up model capacity. Drawing on the learnings from these works, we re-examine previous design choices and find that with appropriate choices: ResNets, cross-entropy based distributional backups, and feature normalization, offline Q-learning algorithms exhibit strong performance that scales with model capacity. Using multi-task Atari as a testbed for scaling and generalization, we train a single policy on 40 games with near-human performance using up-to 80 million parameter networks, finding that model performance scales favorably with capacity. In contrast to prior work, we extrapolate beyond dataset performance even when trained entirely on a large (400M transitions) but highly suboptimal dataset (51% human-level performance). Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal. Finally, we show that offline Q-learning with a diverse dataset is sufficient to learn powerful representations that facilitate rapid transfer to novel games and fast online learning on new variations of a training game, improving over existing state-of-the-art representation learning approaches.
△ Less
Submitted 17 April, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
Search for boosted keV-MeV light dark matter particles from evaporating primordial black holes at the CDEX-10 experiment
Authors:
Z. H. Zhang,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
H. T. Jia,
X. Jiang,
S. Karmakar
, et al. (59 additional authors not shown)
Abstract:
We present novel constraints on boosted light dark matter particles (denoted as ``$χ$'') from evaporating primordial black holes (PBHs) using 205.4 kg$\cdot$day data from the China Jinping Underground Laboratory's CDEX-10 p-type point contact germanium detector with a 160 eVee analysis threshold. $χ$ from PBHs with masses ranging from 1$\times$10$^{15}$ g to 7$\times$10$^{16}$ g are searched in th…
▽ More
We present novel constraints on boosted light dark matter particles (denoted as ``$χ$'') from evaporating primordial black holes (PBHs) using 205.4 kg$\cdot$day data from the China Jinping Underground Laboratory's CDEX-10 p-type point contact germanium detector with a 160 eVee analysis threshold. $χ$ from PBHs with masses ranging from 1$\times$10$^{15}$ g to 7$\times$10$^{16}$ g are searched in this work. In the presence of PBH abundance compatible with present bounds, our result excludes the $χ$-nucleon elastic-scattering cross section region from 3.4$\times$10$^{-32}$ cm$^{2}$ to 2.3$\times$10$^{-29}$ cm$^{2}$ for $χ$ of 1 keV to 24 MeV from PBHs with masses of 5$\times$10$^{15}$ g, as well as from 1.1$\times$10$^{-28}$ cm$^{2}$ to 7.6$\times$10$^{-28}$ cm$^{2}$ for $χ$ of 1 keV to 0.6 MeV from PBHs with masses of 7$\times$10$^{16}$ g. If the $χ$-nucleon elastic-scattering cross section can be determined in the future, the abundance of PBHs may be severely constrained by $χ$ evaporation. With the lower threshold (160 eVee) of the CDEX-10 experiment compared to the previously used experiments, this work allows for a better reach at soft spectra produced by heavier PBHs, which demonstrates the vast potential of such a technical route to pursue $χ$ from larger PBHs with a low threshold.
△ Less
Submitted 7 September, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Tensor Robust PCA with Nonconvex and Nonlocal Regularization
Authors:
Xiaoyu Geng,
Qiang Guo,
Shuaixiong Hui,
Ming Yang,
Caiming Zhang
Abstract:
Tensor robust principal component analysis (TRPCA) is a classical way for low-rank tensor recovery, which minimizes the convex surrogate of tensor rank by shrinking each tensor singular value equally. However, for real-world visual data, large singular values represent more significant information than small singular values. In this paper, we propose a nonconvex TRPCA (N-TRPCA) model based on the…
▽ More
Tensor robust principal component analysis (TRPCA) is a classical way for low-rank tensor recovery, which minimizes the convex surrogate of tensor rank by shrinking each tensor singular value equally. However, for real-world visual data, large singular values represent more significant information than small singular values. In this paper, we propose a nonconvex TRPCA (N-TRPCA) model based on the tensor adjustable logarithmic norm. Unlike TRPCA, our N-TRPCA can adaptively shrink small singular values more and shrink large singular values less. In addition, TRPCA assumes that the whole data tensor is of low rank. This assumption is hardly satisfied in practice for natural visual data, restricting the capability of TRPCA to recover the edges and texture details from noisy images and videos. To this end, we integrate nonlocal self-similarity into N-TRPCA, and further develop a nonconvex and nonlocal TRPCA (NN-TRPCA) model. Specifically, similar nonlocal patches are grouped as a tensor and then each group tensor is recovered by our N-TRPCA. Since the patches in one group are highly correlated, all group tensors have strong low-rank property, leading to an improvement of recovery performance. Experimental results demonstrate that the proposed NN-TRPCA outperforms existing TRPCA methods in visual data recovery. The demo code is available at https://github.com/qguo2010/NN-TRPCA.
△ Less
Submitted 7 July, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models
Authors:
Hao Liu,
Xinyang Geng,
Lisa Lee,
Igor Mordatch,
Sergey Levine,
Sharan Narang,
Pieter Abbeel
Abstract:
Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide range of tasks. In this work, we propose a simple technique that significantly boosts the performance of LLMs without adding computational cost. Our key observati…
▽ More
Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide range of tasks. In this work, we propose a simple technique that significantly boosts the performance of LLMs without adding computational cost. Our key observation is that, by performing the next token prediction task with randomly selected past tokens masked out, we can improve the quality of the learned representations for downstream language understanding tasks. We hypothesize that randomly masking past tokens prevents over-attending to recent tokens and encourages attention to tokens in the distant past. We find that our method, Forgetful Causal Masking (FCM), significantly improves both few-shot and finetuning performance of PaLM. We further consider a simple extension, T-FCM, which introduces bidirectional context to causal language model without altering the sequence order, and further improves finetuning performance.
△ Less
Submitted 31 January, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems from Discrete Observations
Authors:
Yuanyuan Wang,
Wei Huang,
Mingming Gong,
Xi Geng,
Tongliang Liu,
Kun Zhang,
Dacheng Tao
Abstract:
Ordinary Differential Equations (ODEs) have recently gained a lot of attention in machine learning. However, the theoretical aspects, e.g., identifiability and asymptotic properties of statistical estimation are still obscure. This paper derives a sufficient condition for the identifiability of homogeneous linear ODE systems from a sequence of equally-spaced error-free observations sampled from a…
▽ More
Ordinary Differential Equations (ODEs) have recently gained a lot of attention in machine learning. However, the theoretical aspects, e.g., identifiability and asymptotic properties of statistical estimation are still obscure. This paper derives a sufficient condition for the identifiability of homogeneous linear ODE systems from a sequence of equally-spaced error-free observations sampled from a single trajectory. When observations are disturbed by measurement noise, we prove that under mild conditions, the parameter estimator based on the Nonlinear Least Squares (NLS) method is consistent and asymptotic normal with $n^{-1/2}$ convergence rate. Based on the asymptotic normality property, we construct confidence sets for the unknown system parameters and propose a new method to infer the causal structure of the ODE system, i.e., inferring whether there is a causal link between system variables. Furthermore, we extend the results to degraded observations, including aggregated and time-scaled ones. To the best of our knowledge, our work is the first systematic study of the identifiability and asymptotic properties in learning linear ODE systems. We also construct simulations with various system dimensions to illustrate the established theoretical results.
△ Less
Submitted 2 June, 2024; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Search for exotic interactions of solar neutrinos in the CDEX-10 experiment
Authors:
X. P. Geng,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
H. Gong,
Q. J. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
H. T. Jia,
X. Jiang,
S. Karmakar,
H. B. Li
, et al. (60 additional authors not shown)
Abstract:
We investigate exotic neutrino interactions using the 205.4 kg$\cdot$day dataset from the CDEX-10 experiment at the China Jinping Underground Laboratory. New constraints on the mass and couplings of new gauge bosons are presented. Two nonstandard neutrino interactions are considered: a $U(1)_{B-L}$ gauge-boson-induced interaction between an active neutrino and electron/nucleus, and a dark-photon-i…
▽ More
We investigate exotic neutrino interactions using the 205.4 kg$\cdot$day dataset from the CDEX-10 experiment at the China Jinping Underground Laboratory. New constraints on the mass and couplings of new gauge bosons are presented. Two nonstandard neutrino interactions are considered: a $U(1)_{B-L}$ gauge-boson-induced interaction between an active neutrino and electron/nucleus, and a dark-photon-induced interaction between a sterile neutrino and electron/nucleus via kinetic mixing with a photon. This work probes an unexplored parameter space involving sterile neutrino coupling with a dark photon. New laboratory limits are derived on dark photon masses below $1~{\rm eV}/c^{2}$ at some benchmark values of $Δm_{41}^{2}$ and $g^{\prime2}{\rm{sin}}^{2}2θ_{14}$.
△ Less
Submitted 2 June, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
$^{197}$Au($γ,\,xn;\,x\,=\,1\thicksim9$) Reaction Cross Section Measurements using Laser-Driven Ultra-Intense $γ$-Ray Source
Authors:
D. Wu,
H. Y. Lan,
J. Y. Zhang,
J. X. Liu,
H. G. Lu,
J. F. Lv,
X. Z. Wu,
H. Zhang,
J. Cai,
Q. Y. Ma,
Y. H. Xia,
Z. N. Wang,
M. Z. Wang,
Z. Y. Yang,
X. L. Xu,
Y. X. Geng,
Y. Y. Zhao,
C. Lin,
W. J. Ma,
J. Q. Yu,
H. R. Wang,
F. L. Liu,
C. Y. He,
B. Guo,
P. Zhu
, et al. (4 additional authors not shown)
Abstract:
We present a new method for the measurements of photonuclear reaction flux-weighted average cross sections and isomeric ratios using a laser-driven bremsstrahlung $γ$-ray source. An ultra-bright ultra-fast 60$\,\thicksim\,$250 MeV bremsstrahlung $γ$-ray source was established using the 200 TW laser facility in the Compact Laser Plasma Accelerator Laboratory, Peking University, which could cover th…
▽ More
We present a new method for the measurements of photonuclear reaction flux-weighted average cross sections and isomeric ratios using a laser-driven bremsstrahlung $γ$-ray source. An ultra-bright ultra-fast 60$\,\thicksim\,$250 MeV bremsstrahlung $γ$-ray source was established using the 200 TW laser facility in the Compact Laser Plasma Accelerator Laboratory, Peking University, which could cover the energy range from knocking out neutrons to producing pions. Stable quasi-monoenergetic electron beams were generated via laser wakefield acceleration with a charge of 300$\,\thicksim\,$600 pC per shot. The averaged $γ$-ray intensities ($\geqslant$8 MeV) were higher than 10$^{8}$ per shot and the instantaneous intensities can reach above 10$^{19}$ s$^{-1}$ with a duration time about 6.7 ps. $^{65}$Cu($γ,\,n$)$^{64}$Cu and $^{27}$Al($γ,\,x$)$^{24}$Na reactions were used as $γ$-ray flux monitors in the experiments. The flux-weighted average cross sections and isomeric ratios of $^{197}$Au($γ,\,xn;\,x\,=\,1\thicksim9$) reactions were analyzed through activation measurements. The results showed good agreement with previous works and proved this method to be accurate. The $^{197}$Au($γ,\,xn;\,x\,=\,7\thicksim\,9$) reaction cross sections were first achieved with the highest threshold energy of 71.410 MeV. Theoretical cross sections of TALYS 1.9 were calculated to compare with experiment results. This method offered a unique way of gaining insight into photonuclear reaction research, especially for short-lived isomers which extremely lack experimental data.
△ Less
Submitted 23 November, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Machine Learning and Analytical Power Consumption Models for 5G Base Stations
Authors:
Nicola Piovesan,
David Lopez-Perez,
Antonio De Domenico,
Xinli Geng,
Harvey Bao,
Merouane Debbah
Abstract:
The energy consumption of the fifth generation(5G) of mobile networks is one of the major concerns of the telecom industry. However, there is not currently an accurate and tractable approach to evaluate 5G base stations (BSs) power consumption. In this article, we propose a novel model for a realistic characterisation of the power consumption of 5G multi-carrier BSs, which builds on a large data c…
▽ More
The energy consumption of the fifth generation(5G) of mobile networks is one of the major concerns of the telecom industry. However, there is not currently an accurate and tractable approach to evaluate 5G base stations (BSs) power consumption. In this article, we propose a novel model for a realistic characterisation of the power consumption of 5G multi-carrier BSs, which builds on a large data collection campaign. At first, we define a machine learning architecture that allows modelling multiple 5G BS products. Then, we exploit the knowledge gathered by this framework to derive a realistic and analytically tractable power consumption model, which can help driving both theoretical analyses as well as feature standardisation, development and optimisation frameworks. Notably, we demonstrate that such model has high precision, and it is able of capturing the benefits of energy saving mechanisms. We believe this analytical model represents a fundamental tool for understanding 5G BSs power consumption, and accurately optimising the network energy efficiency.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Exotic Dark Matter Search with CDEX-10 Experiment at China's Jinping Underground Laboratory
Authors:
W. H. Dai,
L. P. Jia,
H. Ma,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
H. T. Jia,
X. Jiang,
S. Karmakar
, et al. (59 additional authors not shown)
Abstract:
A search for exotic dark matter (DM) in the sub-GeV mass range has been conducted using 205 kg$\cdot$day data taken from a p-type point contact germanium detector of CDEX-10 experiment at China Jinping underground laboratory. New low-mass dark matter searching channels, neutral current fermionic DM absorption ($χ+A\rightarrow ν+A$) and DM-nucleus 3$\rightarrow$2 scattering ($χ+χ+A\rightarrow φ+A$)…
▽ More
A search for exotic dark matter (DM) in the sub-GeV mass range has been conducted using 205 kg$\cdot$day data taken from a p-type point contact germanium detector of CDEX-10 experiment at China Jinping underground laboratory. New low-mass dark matter searching channels, neutral current fermionic DM absorption ($χ+A\rightarrow ν+A$) and DM-nucleus 3$\rightarrow$2 scattering ($χ+χ+A\rightarrow φ+A$), have been analyzed with an energy threshold of 160 eVee. No significant signal was found. Thus new limits on the DM-nucleon interaction cross section are set for both models at sub-GeV DM mass region. A cross section limit for the fermionic DM absorption is set to be $\rm 2.5\times 10^{-46} cm^2$(90\% C.L.) at DM mass of 10 MeV/c$^2$. For the DM-nucleus 3$\rightarrow$2 scattering scenario, limits are extended to DM mass of 5 MeV/c$^2$ and 14 MeV/c$^2$ for the massless dark photon and bound DM final state, respectively.
△ Less
Submitted 23 November, 2022; v1 submitted 2 September, 2022;
originally announced September 2022.
-
LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval
Authors:
Tao Shen,
Xiubo Geng,
Chongyang Tao,
Can Xu,
Xiaolong Huang,
Binxing Jiao,
Linjun Yang,
Daxin Jiang
Abstract:
In large-scale retrieval, the lexicon-weighting paradigm, learning weighted sparse representations in vocabulary space, has shown promising results with high quality and low latency. Despite it deeply exploiting the lexicon-representing capability of pre-trained language models, a crucial gap remains between language modeling and lexicon-weighting retrieval -- the former preferring certain or low-…
▽ More
In large-scale retrieval, the lexicon-weighting paradigm, learning weighted sparse representations in vocabulary space, has shown promising results with high quality and low latency. Despite it deeply exploiting the lexicon-representing capability of pre-trained language models, a crucial gap remains between language modeling and lexicon-weighting retrieval -- the former preferring certain or low-entropy words whereas the latter favoring pivot or high-entropy words -- becoming the main barrier to lexicon-weighting performance for large-scale retrieval. To bridge this gap, we propose a brand-new pre-training framework, lexicon-bottlenecked masked autoencoder (LexMAE), to learn importance-aware lexicon representations. Essentially, we present a lexicon-bottlenecked module between a normal language modeling encoder and a weakened decoder, where a continuous bag-of-words bottleneck is constructed to learn a lexicon-importance distribution in an unsupervised fashion. The pre-trained LexMAE is readily transferred to the lexicon-weighting retrieval via fine-tuning. On the ad-hoc retrieval benchmark, MS-Marco, it achieves 42.6% MRR@10 with 45.8 QPS for the passage dataset and 44.4% MRR@100 with 134.8 QPS for the document dataset, by a CPU machine. And LexMAE shows state-of-the-art zero-shot transfer capability on BEIR benchmark with 12 datasets.
△ Less
Submitted 4 June, 2023; v1 submitted 31 August, 2022;
originally announced August 2022.
-
LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval
Authors:
Kai Zhang,
Chongyang Tao,
Tao Shen,
Can Xu,
Xiubo Geng,
Binxing Jiao,
Daxin Jiang
Abstract:
Retrieval models based on dense representations in semantic space have become an indispensable branch for first-stage retrieval. These retrievers benefit from surging advances in representation learning towards compressive global sequence-level embeddings. However, they are prone to overlook local salient phrases and entity mentions in texts, which usually play pivot roles in first-stage retrieval…
▽ More
Retrieval models based on dense representations in semantic space have become an indispensable branch for first-stage retrieval. These retrievers benefit from surging advances in representation learning towards compressive global sequence-level embeddings. However, they are prone to overlook local salient phrases and entity mentions in texts, which usually play pivot roles in first-stage retrieval. To mitigate this weakness, we propose to make a dense retriever align a well-performing lexicon-aware representation model. The alignment is achieved by weakened knowledge distillations to enlighten the retriever via two aspects -- 1) a lexicon-augmented contrastive objective to challenge the dense encoder and 2) a pair-wise rank-consistent regularization to make dense model's behavior incline to the other. We evaluate our model on three public benchmarks, which shows that with a comparable lexicon-aware retriever as the teacher, our proposed dense one can bring consistent and significant improvements, and even outdo its teacher. In addition, we found our improvement on the dense retriever is complementary to the standard ranker distillation, which can further lift state-of-the-art performance.
△ Less
Submitted 2 March, 2023; v1 submitted 29 August, 2022;
originally announced August 2022.
-
MultiMatch: Multi-task Learning for Semi-supervised Domain Generalization
Authors:
Lei Qi,
Hongpeng Yang,
Yinghuan Shi,
Xin Geng
Abstract:
Domain generalization (DG) aims at learning a model on source domains to well generalize on the unseen target domain. Although it has achieved great success, most of existing methods require the label information for all training samples in source domains, which is time-consuming and expensive in the real-world application. In this paper, we resort to solving the semi-supervised domain generalizat…
▽ More
Domain generalization (DG) aims at learning a model on source domains to well generalize on the unseen target domain. Although it has achieved great success, most of existing methods require the label information for all training samples in source domains, which is time-consuming and expensive in the real-world application. In this paper, we resort to solving the semi-supervised domain generalization (SSDG) task, where there are a few label information in each source domain. To address the task, we first analyze the theory of the multi-domain learning, which highlights that 1) mitigating the impact of domain gap and 2) exploiting all samples to train the model can effectively reduce the generalization error in each source domain so as to improve the quality of pseudo-labels. According to the analysis, we propose MultiMatch, i.e., extending FixMatch to the multi-task learning framework, producing the high-quality pseudo-label for SSDG. To be specific, we consider each training domain as a single task (i.e., local task) and combine all training domains together (i.e., global task) to train an extra task for the unseen test domain. In the multi-task framework, we utilize the independent BN and classifier for each task, which can effectively alleviate the interference from different domains during pseudo-labeling. Also, most of parameters in the framework are shared, which can be trained by all training samples sufficiently. Moreover, to further boost the pseudo-label accuracy and the model's generalization, we fuse the predictions from the global task and local task during training and testing, respectively. A series of experiments validate the effectiveness of the proposed method, and it outperforms the existing semi-supervised methods and the SSDG method on several benchmark DG datasets.
△ Less
Submitted 29 April, 2024; v1 submitted 11 August, 2022;
originally announced August 2022.
-
Language-Guided Face Animation by Recurrent StyleGAN-based Generator
Authors:
Tiankai Hang,
Huan Yang,
Bei Liu,
Jianlong Fu,
Xin Geng,
Baining Guo
Abstract:
Recent works on language-guided image manipulation have shown great power of language in providing rich semantics, especially for face images. However, the other natural information, motions, in language is less explored. In this paper, we leverage the motion information and study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages. To…
▽ More
Recent works on language-guided image manipulation have shown great power of language in providing rich semantics, especially for face images. However, the other natural information, motions, in language is less explored. In this paper, we leverage the motion information and study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages. To better utilize both semantics and motions from languages, we propose a simple yet effective framework. Specifically, we propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames. To optimize the proposed framework, three carefully designed loss functions are proposed including a regularization loss to keep the face identity, a path length regularization loss to ensure motion smoothness, and a contrastive loss to enable video synthesis with various language guidance in one single model. Extensive experiments with both qualitative and quantitative evaluations on diverse domains (\textit{e.g.,} human face, anime face, and dog face) demonstrate the superiority of our model in generating high-quality and realistic videos from one still image with the guidance of language. Code will be available at https://github.com/TiankaiHang/language-guided-animation.git.
△ Less
Submitted 3 July, 2024; v1 submitted 10 August, 2022;
originally announced August 2022.
-
On the Lack of Gaussian Tail for Rough Line Integrals along Fractional Brownian Paths
Authors:
Horatio Boedihardjo,
Xi Geng
Abstract:
We show that the tail probability of the rough line integral $\int_{0}^{1}φ(X_{t})dY_{t}$, where $(X,Y)$ is a 2D fractional Brownian motion with Hurst parameter $H\in(1/4,1/2)$ and $φ$ is a $C_{b}^{\infty}$-function satisfying a mild non-degeneracy condition on its derivative, cannot decay faster than a $γ$-Weibull tail with any exponent $γ>2H+1$. In particular, this produces a simple class of exa…
▽ More
We show that the tail probability of the rough line integral $\int_{0}^{1}φ(X_{t})dY_{t}$, where $(X,Y)$ is a 2D fractional Brownian motion with Hurst parameter $H\in(1/4,1/2)$ and $φ$ is a $C_{b}^{\infty}$-function satisfying a mild non-degeneracy condition on its derivative, cannot decay faster than a $γ$-Weibull tail with any exponent $γ>2H+1$. In particular, this produces a simple class of examples of differential equations driven by fBM, whose solutions fail to have Gaussian tail even though the underlying vector fields are assumed to be of class $C_{b}^{\infty}$. This also demonstrates that the well-known upper tail estimate proved by Cass-Litterer-Lyons in 2013 is essentially sharp.
△ Less
Submitted 3 November, 2022; v1 submitted 24 June, 2022;
originally announced June 2022.
-
KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Low-Resource NLP
Authors:
Yufei Wang,
Jiayi Zheng,
Can Xu,
Xiubo Geng,
Tao Shen,
Chongyang Tao,
Daxin Jiang
Abstract:
This paper focuses on the data augmentation for low-resource NLP tasks where the training set is limited. The existing solutions either leverage task-independent heuristic rules (e.g., Synonym Replacement) or fine-tune general-purpose pre-trained language models (e.g., GPT2) using the limited training instances to produce new synthetic data. Consequently, they have trivial task-specific knowledge…
▽ More
This paper focuses on the data augmentation for low-resource NLP tasks where the training set is limited. The existing solutions either leverage task-independent heuristic rules (e.g., Synonym Replacement) or fine-tune general-purpose pre-trained language models (e.g., GPT2) using the limited training instances to produce new synthetic data. Consequently, they have trivial task-specific knowledge and are limited to yielding low-quality synthetic data. To combat this issue, we propose Knowledge Mixture Data Augmentation Model (KnowDA) which is an Seq2Seq language model pre-trained on a mixture of diverse NLP tasks under a novel framework of Knowledge Mixture Training (KoMT). The goal of KoMT is to condense diverse NLP task-specific knowledge into the single KnowDA model (i.e., all-in-one) such that KnowDA could utilize these knowledge to quickly grasp the inherent synthesis law of the target task through limited training instances. Specifically, KoMT reformulates input examples from various heterogeneous NLP tasks into a unified text-to-text format, and employs denoising training objectives in different granularity to learn to reconstruct partial or complete samples. To the best of our knowledge, we are the first attempt to apply 100+ NLP multi-task training for data augmentation. Extensive experiments show that i) the synthetic data produced by KnowDA successfully improves performance of the strong pre-trained language models (i.e., Bert, ALBert and Deberta) by a large margin on the low-resource NLP benchmark FewGLUE, CoNLL'03 and WikiAnn; ii) KnowDA successfully transfers the task knowledge to NLP tasks whose types are seen and unseen in KoMT.
△ Less
Submitted 27 January, 2023; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Towards Robust Ranker for Text Retrieval
Authors:
Yucheng Zhou,
Tao Shen,
Xiubo Geng,
Chongyang Tao,
Can Xu,
Guodong Long,
Binxing Jiao,
Daxin Jiang
Abstract:
A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranke…
▽ More
A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranker. Thereby, we propose multiple retrievers as negative generators improve the ranker's robustness, where i) involving extensive out-of-distribution label noises renders the ranker against each noise distribution, and ii) diverse hard negatives from a joint distribution are relatively close to the ranker's negative distribution, leading to more challenging thus effective training. To evaluate our robust ranker (dubbed R$^2$anker), we conduct experiments in various settings on the popular passage retrieval benchmark, including BM25-reranking, full-ranking, retriever distillation, etc. The empirical results verify the new state-of-the-art effectiveness of our model.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Constraints on Sub-GeV Dark Matter--Electron Scattering from the CDEX-10 Experiment
Authors:
Z. Y. Zhang,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
M. Agartioglu,
H. P. An,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
H. T. Jia,
X. Jiang,
H. B. Li
, et al. (60 additional authors not shown)
Abstract:
We present improved germanium-based constraints on sub-GeV dark matter via dark matter--electron ($χ$-$e$) scattering using the 205.4 kg$\cdot$day dataset from the CDEX-10 experiment. Using a novel calculation technique, we attain predicted $χ$-$e$ scattering spectra observable in high-purity germanium detectors. In the heavy mediator scenario, our results achieve 3 orders of magnitude of improvem…
▽ More
We present improved germanium-based constraints on sub-GeV dark matter via dark matter--electron ($χ$-$e$) scattering using the 205.4 kg$\cdot$day dataset from the CDEX-10 experiment. Using a novel calculation technique, we attain predicted $χ$-$e$ scattering spectra observable in high-purity germanium detectors. In the heavy mediator scenario, our results achieve 3 orders of magnitude of improvement for $m_χ$ larger than 80 MeV/c$^2$ compared to previous germanium-based $χ$-$e$ results. We also present the most stringent $χ$-$e$ cross-section limit to date among experiments using solid-state detectors for $m_χ$ larger than 90 MeV/c$^2$ with heavy mediators and $m_χ$ larger than 100 MeV/c$^2$ with electric dipole coupling. The result proves the feasibility and demonstrates the vast potential of a new $χ$-$e$ detection method with high-purity germanium detectors in ultralow radioactive background.
△ Less
Submitted 21 November, 2022; v1 submitted 8 June, 2022;
originally announced June 2022.
-
An inverse random source problem for the Helium production-diffusion equation driven by a fractional Brownian motion
Authors:
Jing Li,
Hao Cheng,
Xiaoxiao Geng
Abstract:
In this paper, we consider the prediction of the helium concentrations as function of a spatially variable source term perturbed by fractional Brownian motion. For the direct problem, we show that it is well-posed and has a unique mild solution under some conditions. For the inverse problem, the uniqueness and the instability are given. In the meanwhile, we determine the statistical properties of…
▽ More
In this paper, we consider the prediction of the helium concentrations as function of a spatially variable source term perturbed by fractional Brownian motion. For the direct problem, we show that it is well-posed and has a unique mild solution under some conditions. For the inverse problem, the uniqueness and the instability are given. In the meanwhile, we determine the statistical properties of the source from the expectation and covariance of the final-time data u(r,T). Finally, numerical implements are given to verify the effectiveness of the proposed reconstruction.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Progressive Purification for Instance-Dependent Partial Label Learning
Authors:
Ning Xu,
Biao Liu,
Jiaqi Lv,
Congyu Qiao,
Xin Geng
Abstract:
Partial label learning (PLL) aims to train multiclass classifiers from the examples each annotated with a set of candidate labels where a fixed but unknown candidate label is correct. In the last few years, the instance-independent generation process of candidate labels has been extensively studied, on the basis of which many theoretical advances have been made in PLL. Nevertheless, the candidate…
▽ More
Partial label learning (PLL) aims to train multiclass classifiers from the examples each annotated with a set of candidate labels where a fixed but unknown candidate label is correct. In the last few years, the instance-independent generation process of candidate labels has been extensively studied, on the basis of which many theoretical advances have been made in PLL. Nevertheless, the candidate labels are always instance-dependent in practice and there is no theoretical guarantee that the model trained on the instance-dependent PLL examples can converge to an ideal one. In this paper, a theoretically grounded and practically effective approach named POP, i.e. PrOgressive Purification for instance-dependent partial label learning, is proposed. Specifically, POP updates the learning model and purifies each candidate label set progressively in every epoch. Theoretically, we prove that POP enlarges the region appropriately fast where the model is reliable, and eventually approximates the Bayes optimal classifier with mild assumptions. Technically, POP is flexible with arbitrary PLL losses and could improve the performance of the previous PLL losses in the instance-dependent case. Experiments on the benchmark datasets and the real-world datasets validate the effectiveness of the proposed method.
△ Less
Submitted 9 May, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
One Positive Label is Sufficient: Single-Positive Multi-Label Learning with Label Enhancement
Authors:
Ning Xu,
Congyu Qiao,
Jiaqi Lv,
Xin Geng,
Min-Ling Zhang
Abstract:
Multi-label learning (MLL) learns from the examples each associated with multiple labels simultaneously, where the high cost of annotating all relevant labels for each training example is challenging for real-world applications. To cope with the challenge, we investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label, and show that one can…
▽ More
Multi-label learning (MLL) learns from the examples each associated with multiple labels simultaneously, where the high cost of annotating all relevant labels for each training example is challenging for real-world applications. To cope with the challenge, we investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label, and show that one can successfully learn a theoretically grounded multi-label classifier for the problem. In this paper, a novel SPMLL method named SMILE, i.e., Single-positive MultI-label learning with Label Enhancement, is proposed. Specifically, an unbiased risk estimator is derived, which could be guaranteed to approximately converge to the optimal risk minimizer of fully supervised learning and shows that one positive label of each instance is sufficient to train the predictive model. Then, the corresponding empirical risk estimator is established via recovering the latent soft label as a label enhancement process, where the posterior density of the latent soft labels is approximate to the variational Beta density parameterized by an inference model. Experiments on benchmark datasets validate the effectiveness of the proposed method.
△ Less
Submitted 11 October, 2022; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Multimodal Masked Autoencoders Learn Transferable Representations
Authors:
Xinyang Geng,
Hao Liu,
Lisa Lee,
Dale Schuurmans,
Sergey Levine,
Pieter Abbeel
Abstract:
Building scalable models to learn from diverse, multimodal data remains an open challenge. For vision-language data, the dominant approaches are based on contrastive learning objectives that train a separate encoder for each modality. While effective, contrastive learning approaches introduce sampling bias depending on the data augmentations used, which can degrade performance on downstream tasks.…
▽ More
Building scalable models to learn from diverse, multimodal data remains an open challenge. For vision-language data, the dominant approaches are based on contrastive learning objectives that train a separate encoder for each modality. While effective, contrastive learning approaches introduce sampling bias depending on the data augmentations used, which can degrade performance on downstream tasks. Moreover, these methods are limited to paired image-text data, and cannot leverage widely-available unpaired data. In this paper, we investigate whether a large multimodal model trained purely via masked token prediction, without using modality-specific encoders or contrastive learning, can learn transferable representations for downstream tasks. We propose a simple and scalable network architecture, the Multimodal Masked Autoencoder (M3AE), which learns a unified encoder for both vision and language data via masked token prediction. We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks. Surprisingly, we find that M3AE benefits from a higher text mask ratio (50-90%), in contrast to BERT whose standard masking ratio is 15%, due to the joint training of two data modalities. We also provide qualitative analysis showing that the learned representation incorporates meaningful information from both image and language. Lastly, we demonstrate the scalability of M3AE with larger model size and training time, and its flexibility to train on both paired image-text data as well as unpaired data.
△ Less
Submitted 21 October, 2022; v1 submitted 27 May, 2022;
originally announced May 2022.
-
UnifieR: A Unified Retriever for Large-Scale Retrieval
Authors:
Tao Shen,
Xiubo Geng,
Chongyang Tao,
Can Xu,
Guodong Long,
Kai Zhang,
Daxin Jiang
Abstract:
Large-scale retrieval is to recall relevant documents from a huge collection given a query. It relies on representation learning to embed documents and queries into a common semantic encoding space. According to the encoding space, recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. These two paradigms…
▽ More
Large-scale retrieval is to recall relevant documents from a huge collection given a query. It relies on representation learning to embed documents and queries into a common semantic encoding space. According to the encoding space, recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. These two paradigms unveil the PLMs' representation capability in different granularities, i.e., global sequence-level compression and local word-level contexts, respectively. Inspired by their complementary global-local contextualization and distinct representing views, we propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability. Experiments on passage retrieval benchmarks verify its effectiveness in both paradigms. A uni-retrieval scheme is further presented with even better retrieval quality. We lastly evaluate the model on BEIR benchmark to verify its transferability.
△ Less
Submitted 4 June, 2023; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Search for Neutrinoless Double-Beta Decay of $^{76}$Ge with a Natural Broad Energy Germanium Detector
Authors:
CDEX collaboration,
W. H. Dai,
H. Ma,
Q. Yue,
Z. She,
K. J. Kang,
Y. J. Li,
M. Agartioglu,
H. P. An,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
H. T. Jia,
X. Jiang
, et al. (61 additional authors not shown)
Abstract:
A natural broad energy germanium (BEGe) detector is operated in the China Jinping Underground Laboratory (CJPL) for a feasibility study of building the next generation experiment of the neutrinoless double-beta (0{$νββ$}) decay of $^{76}$Ge. The setup of the prototype facility, characteristics of the BEGe detector, background reduction methods, and data analysis are described in this paper. A back…
▽ More
A natural broad energy germanium (BEGe) detector is operated in the China Jinping Underground Laboratory (CJPL) for a feasibility study of building the next generation experiment of the neutrinoless double-beta (0{$νββ$}) decay of $^{76}$Ge. The setup of the prototype facility, characteristics of the BEGe detector, background reduction methods, and data analysis are described in this paper. A background index of 6.4$\times$10$^{-3}$ counts/(keV$\cdot$kg$\cdot$day) is achieved and 1.86 times lower than our previous result of the CDEX-1 detector. No signal is observed with an exposure of 186.4 kg$\cdot$day, thus a limit on the half life of $^{76}$Ge 0$νββ$ decay is set at T$_{1/2}^{0ν}$ $>$ 5.62$\times$10$^{22}$ yr at 90% C.L.. The limit corresponds to an effective Majorana neutrino mass in the range of 4.6 $\sim$ 10.3 eV, dependent on the nuclear matrix elements.
△ Less
Submitted 5 August, 2022; v1 submitted 21 May, 2022;
originally announced May 2022.
-
Unifying the Convergences in Multilingual Neural Machine Translation
Authors:
Yichong Huang,
Xiaocheng Feng,
Xinwei Geng,
Bing Qin
Abstract:
Although all-in-one-model multilingual neural machine translation (multilingual NMT) has achieved remarkable progress, the convergence inconsistency in the joint training is ignored, i.e., different language pairs reaching convergence in different epochs. This leads to the trained MNMT model over-fitting low-resource language translations while under-fitting high-resource ones. In this paper, we p…
▽ More
Although all-in-one-model multilingual neural machine translation (multilingual NMT) has achieved remarkable progress, the convergence inconsistency in the joint training is ignored, i.e., different language pairs reaching convergence in different epochs. This leads to the trained MNMT model over-fitting low-resource language translations while under-fitting high-resource ones. In this paper, we propose a novel training strategy named LSSD (Language-Specific Self-Distillation), which can alleviate the convergence inconsistency and help MNMT models achieve the best performance on each language pair simultaneously. Specifically, LSSD picks up language-specific best checkpoints for each language pair to teach the current model on the fly. Furthermore, we systematically explore three sample-level manipulations of knowledge transferring. Experimental results on three datasets show that LSSD obtains consistent improvements towards all language pairs and achieves the state-of-the-art.
△ Less
Submitted 19 October, 2022; v1 submitted 3 May, 2022;
originally announced May 2022.
-
Label Distribution Learning for Generalizable Multi-source Person Re-identification
Authors:
Lei Qi,
Jiaying Shen,
Jiaqi Liu,
Yinghuan Shi,
Xin Geng
Abstract:
Person re-identification (Re-ID) is a critical technique in the video surveillance system, which has achieved significant success in the supervised setting. However, it is difficult to directly apply the supervised model to arbitrary unseen domains due to the domain gap between the available source domains and unseen target domains. In this paper, we propose a novel label distribution learning (LD…
▽ More
Person re-identification (Re-ID) is a critical technique in the video surveillance system, which has achieved significant success in the supervised setting. However, it is difficult to directly apply the supervised model to arbitrary unseen domains due to the domain gap between the available source domains and unseen target domains. In this paper, we propose a novel label distribution learning (LDL) method to address the generalizable multi-source person Re-ID task (i.e., there are multiple available source domains, and the testing domain is unseen during training), which aims to explore the relation of different classes and mitigate the domain-shift across different domains so as to improve the discrimination of the model and learn the domain-invariant feature, simultaneously. Specifically, during the training process, we produce the label distribution via the online manner to mine the relation information of different classes, thus it is beneficial for extracting the discriminative feature. Besides, for the label distribution of each class, we further revise it to give more and equal attention to the other domains that the class does not belong to, which can effectively reduce the domain gap across different domains and obtain the domain-invariant feature. Furthermore, we also give the theoretical analysis to demonstrate that the proposed method can effectively deal with the domain-shift issue. Extensive experiments on multiple benchmark datasets validate the effectiveness of the proposed method and show that the proposed method can outperform the state-of-the-art methods. Besides, further analysis also reveals the superiority of the proposed method.
△ Less
Submitted 24 August, 2022; v1 submitted 12 April, 2022;
originally announced April 2022.
-
Stylized Knowledge-Grounded Dialogue Generation via Disentangled Template Rewriting
Authors:
Qingfeng Sun,
Can Xu,
Huang Hu,
Yujing Wang,
Jian Miao,
Xiubo Geng,
Yining Chen,
Fei Xu,
Daxin Jiang
Abstract:
Current Knowledge-Grounded Dialogue Generation (KDG) models specialize in producing rational and factual responses. However, to establish long-term relationships with users, the KDG model needs the capability to generate responses in a desired style or attribute. Thus, we study a new problem: Stylized Knowledge-Grounded Dialogue Generation (SKDG). It presents two challenges: (1) How to train a SKD…
▽ More
Current Knowledge-Grounded Dialogue Generation (KDG) models specialize in producing rational and factual responses. However, to establish long-term relationships with users, the KDG model needs the capability to generate responses in a desired style or attribute. Thus, we study a new problem: Stylized Knowledge-Grounded Dialogue Generation (SKDG). It presents two challenges: (1) How to train a SKDG model where no <context, knowledge, stylized response> triples are available. (2) How to cohere with context and preserve the knowledge when generating a stylized response. In this paper, we propose a novel disentangled template rewriting (DTR) method which generates responses via combing disentangled style templates (from monolingual stylized corpus) and content templates (from KDG corpus). The entire framework is end-to-end differentiable and learned without supervision. Extensive experiments on two benchmarks indicate that DTR achieves a significant improvement on all evaluation metrics compared with previous state-of-the-art stylized dialogue generation methods. Besides, DTR achieves comparable performance with the state-of-the-art KDG methods in standard KDG evaluation setting.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
Self-Supervised Graph Neural Network for Multi-Source Domain Adaptation
Authors:
Jin Yuan,
Feng Hou,
Yangzhou Du,
Zhongchao Shi,
Xin Geng,
Jianping Fan,
Yong Rui
Abstract:
Domain adaptation (DA) tries to tackle the scenarios when the test data does not fully follow the same distribution of the training data, and multi-source domain adaptation (MSDA) is very attractive for real world applications. By learning from large-scale unlabeled samples, self-supervised learning has now become a new trend in deep learning. It is worth noting that both self-supervised learning…
▽ More
Domain adaptation (DA) tries to tackle the scenarios when the test data does not fully follow the same distribution of the training data, and multi-source domain adaptation (MSDA) is very attractive for real world applications. By learning from large-scale unlabeled samples, self-supervised learning has now become a new trend in deep learning. It is worth noting that both self-supervised learning and multi-source domain adaptation share a similar goal: they both aim to leverage unlabeled data to learn more expressive representations. Unfortunately, traditional multi-task self-supervised learning faces two challenges: (1) the pretext task may not strongly relate to the downstream task, thus it could be difficult to learn useful knowledge being shared from the pretext task to the target task; (2) when the same feature extractor is shared between the pretext task and the downstream one and only different prediction heads are used, it is ineffective to enable inter-task information exchange and knowledge sharing. To address these issues, we propose a novel \textbf{S}elf-\textbf{S}upervised \textbf{G}raph Neural Network (SSG), where a graph neural network is used as the bridge to enable more effective inter-task information exchange and knowledge sharing. More expressive representation is learned by adopting a mask token strategy to mask some domain information. Our extensive experiments have demonstrated that our proposed SSG method has achieved state-of-the-art results over four multi-source domain adaptation datasets, which have shown the effectiveness of our proposed SSG method from different aspects.
△ Less
Submitted 15 January, 2024; v1 submitted 7 April, 2022;
originally announced April 2022.
-
Decompositional Generation Process for Instance-Dependent Partial Label Learning
Authors:
Congyu Qiao,
Ning Xu,
Xin Geng
Abstract:
Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels and model the generation process of the candidate labels in a simple way. However, these…
▽ More
Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels and model the generation process of the candidate labels in a simple way. However, these approaches usually do not perform as well as expected due to the fact that the generation process of the candidate labels is always instance-dependent. Therefore, it deserves to be modeled in a refined way. In this paper, we consider instance-dependent PLL and assume that the generation process of the candidate labels could decompose into two sequential parts, where the correct label emerges first in the mind of the annotator but then the incorrect labels related to the feature are also selected with the correct label as candidate labels due to uncertainty of labeling. Motivated by this consideration, we propose a novel PLL method that performs Maximum A Posterior (MAP) based on an explicitly modeled generation process of candidate labels via decomposed probability distribution models. Extensive experiments on manually corrupted benchmark datasets and real-world datasets validate the effectiveness of the proposed method. Source code is available at https://github.com/palm-ml/idgp.
△ Less
Submitted 1 February, 2023; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Achieving Social Optimum in Non-convex Cooperative Aggregative Games: A Distributed Stochastic Annealing Approach
Authors:
Yinghui Wang,
Xiaoxue Geng,
Guanpu Chen,
Wenxiao Zhao
Abstract:
This paper designs a distributed stochastic annealing algorithm for non-convex cooperative aggregative games, whose agents' cost functions not only depend on agents' own decision variables but also rely on the sum of agents' decision variables. To seek the the social optimum of cooperative aggregative games, a distributed stochastic annealing algorithm is proposed, where the local cost functions a…
▽ More
This paper designs a distributed stochastic annealing algorithm for non-convex cooperative aggregative games, whose agents' cost functions not only depend on agents' own decision variables but also rely on the sum of agents' decision variables. To seek the the social optimum of cooperative aggregative games, a distributed stochastic annealing algorithm is proposed, where the local cost functions are non-convex and the communication topology between agents is time varying. The weak convergence to the social optimum of the algorithm is further analyzed. A numerical example is given to illustrate the effectiveness of the proposed algorithm.
△ Less
Submitted 1 April, 2022;
originally announced April 2022.
-
CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow
Authors:
Xiuchao Sui,
Shaohua Li,
Xue Geng,
Yan Wu,
Xinxing Xu,
Yong Liu,
Rick Goh,
Hongyuan Zhu
Abstract:
Optical flow estimation aims to find the 2D motion field by identifying corresponding pixels between two images. Despite the tremendous progress of deep learning-based optical flow methods, it remains a challenge to accurately estimate large displacements with motion blur. This is mainly because the correlation volume, the basis of pixel matching, is computed as the dot product of the convolutiona…
▽ More
Optical flow estimation aims to find the 2D motion field by identifying corresponding pixels between two images. Despite the tremendous progress of deep learning-based optical flow methods, it remains a challenge to accurately estimate large displacements with motion blur. This is mainly because the correlation volume, the basis of pixel matching, is computed as the dot product of the convolutional features of the two images. The locality of convolutional features makes the computed correlations susceptible to various noises. On large displacements with motion blur, noisy correlations could cause severe errors in the estimated flow. To overcome this challenge, we propose a new architecture "CRoss-Attentional Flow Transformer" (CRAFT), aiming to revitalize the correlation volume computation. In CRAFT, a Semantic Smoothing Transformer layer transforms the features of one frame, making them more global and semantically stable. In addition, the dot-product correlations are replaced with transformer Cross-Frame Attention. This layer filters out feature noises through the Query and Key projections, and computes more accurate correlations. On Sintel (Final) and KITTI (foreground) benchmarks, CRAFT has achieved new state-of-the-art performance. Moreover, to test the robustness of different models on large motions, we designed an image shifting attack that shifts input images to generate large artificial motions. Under this attack, CRAFT performs much more robustly than two representative methods, RAFT and GMA. The code of CRAFT is is available at https://github.com/askerlee/craft.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark
Authors:
Li Chen,
Chonghao Sima,
Yang Li,
Zehan Zheng,
Jiajie Xu,
Xiangwei Geng,
Hongyang Li,
Conghui He,
Jianping Shi,
Yu Qiao,
Junchi Yan
Abstract:
Methods for 3D lane detection have been recently proposed to address the issue of inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.). Previous work struggled in complex cases due to their simple designs of the spatial transformation between front view and bird's eye view (BEV) and the lack of a realistic dataset. Towards these issues, we present PersFormer:…
▽ More
Methods for 3D lane detection have been recently proposed to address the issue of inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.). Previous work struggled in complex cases due to their simple designs of the spatial transformation between front view and bird's eye view (BEV) and the lack of a realistic dataset. Towards these issues, we present PersFormer: an end-to-end monocular 3D lane detector with a novel Transformer-based spatial feature transformation module. Our model generates BEV features by attending to related front-view local regions with camera parameters as a reference. PersFormer adopts a unified 2D/3D anchor design and an auxiliary task to detect 2D/3D lanes simultaneously, enhancing the feature consistency and sharing the benefits of multi-task learning. Moreover, we release one of the first large-scale real-world 3D lane datasets: OpenLane, with high-quality annotation and scenario diversity. OpenLane contains 200,000 frames, over 880,000 instance-level lanes, 14 lane categories, along with scene tags and the closed-in-path object annotations to encourage the development of lane detection and more industrial-related autonomous driving methods. We show that PersFormer significantly outperforms competitive baselines in the 3D lane detection task on our new OpenLane dataset as well as Apollo 3D Lane Synthetic dataset, and is also on par with state-of-the-art algorithms in the 2D task on OpenLane. The project page is available at https://github.com/OpenPerceptionX/PersFormer_3DLane and OpenLane dataset is provided at https://github.com/OpenPerceptionX/OpenLane.
△ Less
Submitted 19 July, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge
Authors:
Chao-Hong Tan,
Jia-Chen Gu,
Chongyang Tao,
Zhen-Hua Ling,
Can Xu,
Huang Hu,
Xiubo Geng,
Daxin Jiang
Abstract:
Generating natural and informative texts has been a long-standing problem in NLP. Much effort has been dedicated into incorporating pre-trained language models (PLMs) with various open-world knowledge, such as knowledge graphs or wiki pages. However, their ability to access and manipulate the task-specific knowledge is still limited on downstream tasks, as this type of knowledge is usually not wel…
▽ More
Generating natural and informative texts has been a long-standing problem in NLP. Much effort has been dedicated into incorporating pre-trained language models (PLMs) with various open-world knowledge, such as knowledge graphs or wiki pages. However, their ability to access and manipulate the task-specific knowledge is still limited on downstream tasks, as this type of knowledge is usually not well covered in PLMs and is hard to acquire. To address the problem, we propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework. Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively on the basis of PLMs. With the help of these two types of knowledge, our model can learn what and how to generate. Experiments on two text generation tasks of dialogue generation and question generation, and on two datasets show that our method achieves better performance than various baseline models.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
HeterMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations
Authors:
Jia-Chen Gu,
Chao-Hong Tan,
Chongyang Tao,
Zhen-Hua Ling,
Huang Hu,
Xiubo Geng,
Daxin Jiang
Abstract:
Recently, various response generation models for two-party conversations have achieved impressive improvements, but less effort has been paid to multi-party conversations (MPCs) which are more practical and complicated. Compared with a two-party conversation where a dialogue context is a sequence of utterances, building a response generation model for MPCs is more challenging, since there exist co…
▽ More
Recently, various response generation models for two-party conversations have achieved impressive improvements, but less effort has been paid to multi-party conversations (MPCs) which are more practical and complicated. Compared with a two-party conversation where a dialogue context is a sequence of utterances, building a response generation model for MPCs is more challenging, since there exist complicated context structures and the generated responses heavily rely on both interlocutors (i.e., speaker and addressee) and history utterances. To address these challenges, we present HeterMPC, a heterogeneous graph-based neural network for response generation in MPCs which models the semantics of utterances and interlocutors simultaneously with two types of nodes in a graph. Besides, we also design six types of meta relations with node-edge-type-dependent parameters to characterize the heterogeneous interactions within the graph. Through multi-hop updating, HeterMPC can adequately utilize the structural knowledge of conversations for response generation. Experimental results on the Ubuntu Internet Relay Chat (IRC) channel benchmark show that HeterMPC outperforms various baseline models for response generation in MPCs.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
Graph Attention Transformer Network for Multi-Label Image Classification
Authors:
Jin Yuan,
Shikai Chen,
Yao Zhang,
Zhongchao Shi,
Xin Geng,
Jianping Fan,
Yong Rui
Abstract:
Multi-label classification aims to recognize multiple objects or attributes from images. However, it is challenging to learn from proper label graphs to effectively characterize such inter-label correlations or dependencies. Current methods often use the co-occurrence probability of labels based on the training set as the adjacency matrix to model this correlation, which is greatly limited by the…
▽ More
Multi-label classification aims to recognize multiple objects or attributes from images. However, it is challenging to learn from proper label graphs to effectively characterize such inter-label correlations or dependencies. Current methods often use the co-occurrence probability of labels based on the training set as the adjacency matrix to model this correlation, which is greatly limited by the dataset and affects the model's generalization ability. In this paper, we propose a Graph Attention Transformer Network (GATN), a general framework for multi-label image classification that can effectively mine complex inter-label relationships. First, we use the cosine similarity based on the label word embedding as the initial correlation matrix, which can represent rich semantic information. Subsequently, we design the graph attention transformer layer to transfer this adjacency matrix to adapt to the current domain. Our extensive experiments have demonstrated that our proposed methods can achieve state-of-the-art performance on three datasets.
△ Less
Submitted 15 January, 2024; v1 submitted 8 March, 2022;
originally announced March 2022.
-
ClarET: Pre-training a Correlation-Aware Context-To-Event Transformer for Event-Centric Generation and Classification
Authors:
Yucheng Zhou,
Tao Shen,
Xiubo Geng,
Guodong Long,
Daxin Jiang
Abstract:
Generating new events given context with correlated ones plays a crucial role in many event-centric reasoning tasks. Existing works either limit their scope to specific scenarios or overlook event-level correlations. In this paper, we propose to pre-train a general Correlation-aware context-to-Event Transformer (ClarET) for event-centric reasoning. To achieve this, we propose three novel event-cen…
▽ More
Generating new events given context with correlated ones plays a crucial role in many event-centric reasoning tasks. Existing works either limit their scope to specific scenarios or overlook event-level correlations. In this paper, we propose to pre-train a general Correlation-aware context-to-Event Transformer (ClarET) for event-centric reasoning. To achieve this, we propose three novel event-centric objectives, i.e., whole event recovering, contrastive event-correlation encoding and prompt-based event locating, which highlight event-level correlations with effective training. The proposed ClarET is applicable to a wide range of event-centric reasoning scenarios, considering its versatility of (i) event-correlation types (e.g., causal, temporal, contrast), (ii) application formulations (i.e., generation and classification), and (iii) reasoning types (e.g., abductive, counterfactual and ending reasoning). Empirical fine-tuning results, as well as zero- and few-shot learning, on 9 benchmarks (5 generation and 4 classification tasks covering 4 reasoning types with diverse event correlations), verify its effectiveness and generalization ability.
△ Less
Submitted 9 March, 2022; v1 submitted 4 March, 2022;
originally announced March 2022.
-
Dual-Branched Spatio-temporal Fusion Network for Multi-horizon Tropical Cyclone Track Forecast
Authors:
Zili Liu,
Kun Hao,
Xiaoyi Geng,
Zhenwei Shi
Abstract:
Tropical cyclone (TC) is an extreme tropical weather system and its trajectory can be described by a variety of spatio-temporal data. Effective mining of these data is the key to accurate TCs track forecasting. However, existing methods face the problem that the model complexity is too high or it is difficult to efficiently extract features from multi-modal data. In this paper, we propose the Dual…
▽ More
Tropical cyclone (TC) is an extreme tropical weather system and its trajectory can be described by a variety of spatio-temporal data. Effective mining of these data is the key to accurate TCs track forecasting. However, existing methods face the problem that the model complexity is too high or it is difficult to efficiently extract features from multi-modal data. In this paper, we propose the Dual-Branched spatio-temporal Fusion Network (DBF-Net) -- a novel multi-horizon tropical cyclone track forecasting model which fuses the multi-modal features efficiently. DBF-Net contains a TC features branch that extracts temporal features from 1D inherent features of TCs and a pressure field branch that extracts spatio-temporal features from reanalysis 2D pressure field. Through the encoder-decoder-based architecture and efficient feature fusion, DBF-Net can fully mine the information of the two types of data, and achieve good TCs track prediction results. Extensive experiments on historical TCs track data in the Northwest Pacific show that our DBF-Net achieves significant improvement compared with existing statistical and deep learning TCs track forecast methods.
△ Less
Submitted 27 February, 2022;
originally announced February 2022.
-
PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks
Authors:
Yufei Wang,
Can Xu,
Qingfeng Sun,
Huang Hu,
Chongyang Tao,
Xiubo Geng,
Daxin Jiang
Abstract:
This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based D}ata Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthet…
▽ More
This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based D}ata Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The NLU models can be further improved when they are combined for training.
△ Less
Submitted 17 March, 2022; v1 submitted 25 February, 2022;
originally announced February 2022.
-
Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization
Authors:
Brandon Trabucco,
Xinyang Geng,
Aviral Kumar,
Sergey Levine
Abstract:
Black-box model-based optimization (MBO) problems, where the goal is to find a design input that maximizes an unknown objective function, are ubiquitous in a wide range of domains, such as the design of proteins, DNA sequences, aircraft, and robots. Solving model-based optimization problems typically requires actively querying the unknown objective function on design proposals, which means physica…
▽ More
Black-box model-based optimization (MBO) problems, where the goal is to find a design input that maximizes an unknown objective function, are ubiquitous in a wide range of domains, such as the design of proteins, DNA sequences, aircraft, and robots. Solving model-based optimization problems typically requires actively querying the unknown objective function on design proposals, which means physically building the candidate molecule, aircraft, or robot, testing it, and storing the result. This process can be expensive and time consuming, and one might instead prefer to optimize for the best design using only the data one already has. This setting -- called offline MBO -- poses substantial and different algorithmic challenges than more commonly studied online techniques. A number of recent works have demonstrated success with offline MBO for high-dimensional optimization problems using high-capacity deep neural networks. However, the lack of standardized benchmarks in this emerging field is making progress difficult to track. To address this, we present Design-Bench, a benchmark for offline MBO with a unified evaluation protocol and reference implementations of recent methods. Our benchmark includes a suite of diverse and realistic tasks derived from real-world optimization problems in biology, materials science, and robotics that present distinct challenges for offline MBO. Our benchmark and reference implementations are released at github.com/rail-berkeley/design-bench and github.com/rail-berkeley/design-baselines.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
Improving the accuracy of hard photon emission by Sigmoid sampling of the QED-table in particle-in-cell-Monte-Carlo simulations
Authors:
Yinlong Guo,
Xuesong Geng,
Liangliang Ji,
Baifei Shen,
Ruxin Li
Abstract:
Research on laser-plasma interaction in the quantum-electrodynamic (QED) regime has been greatly advanced by particle-in-cell & Monte-Carlo simulations (PIC-MC). While these simulations are widely used, we find that noticeable numerical error arises due to inappropriate implementation of the quantum process accounting for hard photon emission and pair production in the PIC-MC codes. The error stem…
▽ More
Research on laser-plasma interaction in the quantum-electrodynamic (QED) regime has been greatly advanced by particle-in-cell & Monte-Carlo simulations (PIC-MC). While these simulations are widely used, we find that noticeable numerical error arises due to inappropriate implementation of the quantum process accounting for hard photon emission and pair production in the PIC-MC codes. The error stems from the low resolution of the QED table used to sample photon energy, which is generated in the logarithmic scale and cannot resolve high energy photons. We propose a new sampling method via Sigmoid function that handles both the low energy and high energy end of the photon emission spectrum. It guarantees the accuracy of PIC-MC algorithms for hard photon radiation and other related processes in the strong-field QED regime.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Quasi-monochromatic bright gamma-ray generation from synchronized Compton scattering via azimuthal spatial-temporal coupling
Authors:
Xuesong Geng,
Liangliang Ji,
Baifei Shen
Abstract:
High energy photons can be generated via inverse Compton scattering (ICS) in the collision between energetic electrons and intense laser pulse. The development of laser plasma accelerators promises compact and all-optical gamma-ray sources by colliding the electrons from laser wakefield accelerators to its high-power driving pulse reflected by a plasma mirror. However, the law of optical focusing…
▽ More
High energy photons can be generated via inverse Compton scattering (ICS) in the collision between energetic electrons and intense laser pulse. The development of laser plasma accelerators promises compact and all-optical gamma-ray sources by colliding the electrons from laser wakefield accelerators to its high-power driving pulse reflected by a plasma mirror. However, the law of optical focusing hinders realization of both high photon yield and monochromatic spectrum in this scenario. We propose an azimuthal spatial-temporal convertor that decouples the focal field strength from laser spot size using helical parabolic geometry. It decomposes the driving laser beam into a pulse train of almost identical divergence angle and focal depth, creating synchronized ICS in the optimized linear regime. The scheme resolves the dilemma between high efficiency and narrow energy spread, facilitating the generation of monochromatic gamma-ray using high power lasers beyond relativistic field strengths.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Non-degeneracy of Stochastic Line Integrals
Authors:
Xi Geng,
Sheng Wang
Abstract:
We derive quantitative criteria for the existence of density for stochastic line integrals and iterated line integrals along solutions of hypoelliptic differential equations driven by fractional Brownian motion. As an application, we also study the signature uniqueness problem for these rough differential equations.
We derive quantitative criteria for the existence of density for stochastic line integrals and iterated line integrals along solutions of hypoelliptic differential equations driven by fractional Brownian motion. As an application, we also study the signature uniqueness problem for these rough differential equations.
△ Less
Submitted 6 February, 2022;
originally announced February 2022.
-
Effective Boundary Conditions Arising from the Heat Equation with Three-dimensional Interior Inclusion
Authors:
Xingri Geng
Abstract:
We study the initial boundary value problem for a heat equation in a domain containing a thin layer. The thermal conductivity of the layer is drastically different from that of the bulk of the domain; moreover, the layer is anisotropic and ``optimally aligned" in the sense that the normal direction in the layer is always an eigenvector of the thermal tensor. To reveal the effects of the layer, we…
▽ More
We study the initial boundary value problem for a heat equation in a domain containing a thin layer. The thermal conductivity of the layer is drastically different from that of the bulk of the domain; moreover, the layer is anisotropic and ``optimally aligned" in the sense that the normal direction in the layer is always an eigenvector of the thermal tensor. To reveal the effects of the layer, we regard it as a thickless surface on which ``effective boundary conditions" (EBCs) are satisfied by the limit of solutions of the initial boundary value problem as the thickness of the layer shrinks to zero. These EBCs are rich in variety and type, including some nonstandard ones such as the Dirichlet-to-Neumann mapping and the fractional Laplacian.
△ Less
Submitted 24 January, 2023; v1 submitted 2 February, 2022;
originally announced February 2022.
-
PCL: Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings
Authors:
Qiyu Wu,
Chongyang Tao,
Tao Shen,
Can Xu,
Xiubo Geng,
Daxin Jiang
Abstract:
Learning sentence embeddings in an unsupervised manner is fundamental in natural language processing. Recent common practice is to couple pre-trained language models with unsupervised contrastive learning, whose success relies on augmenting a sentence with a semantically-close positive instance to construct contrastive pairs. Nonetheless, existing approaches usually depend on a mono-augmenting str…
▽ More
Learning sentence embeddings in an unsupervised manner is fundamental in natural language processing. Recent common practice is to couple pre-trained language models with unsupervised contrastive learning, whose success relies on augmenting a sentence with a semantically-close positive instance to construct contrastive pairs. Nonetheless, existing approaches usually depend on a mono-augmenting strategy, which causes learning shortcuts towards the augmenting biases and thus corrupts the quality of sentence embeddings. A straightforward solution is resorting to more diverse positives from a multi-augmenting strategy, while an open question remains about how to unsupervisedly learn from the diverse positives but with uneven augmenting qualities in the text field. As one answer, we propose a novel Peer-Contrastive Learning (PCL) with diverse augmentations. PCL constructs diverse contrastive positives and negatives at the group level for unsupervised sentence embeddings. PCL performs peer-positive contrast as well as peer-network cooperation, which offers an inherent anti-bias ability and an effective way to learn from diverse augmentations. Experiments on STS benchmarks verify the effectiveness of PCL against its competitors in unsupervised sentence embeddings.
△ Less
Submitted 19 October, 2022; v1 submitted 28 January, 2022;
originally announced January 2022.
-
A Novel Mix-normalization Method for Generalizable Multi-source Person Re-identification
Authors:
Lei Qi,
Lei Wang,
Yinghuan Shi,
Xin Geng
Abstract:
Person re-identification (Re-ID) has achieved great success in the supervised scenario. However, it is difficult to directly transfer the supervised model to arbitrary unseen domains due to the model overfitting to the seen source domains. In this paper, we aim to tackle the generalizable multi-source person Re-ID task (i.e., there are multiple available source domains, and the testing domain is u…
▽ More
Person re-identification (Re-ID) has achieved great success in the supervised scenario. However, it is difficult to directly transfer the supervised model to arbitrary unseen domains due to the model overfitting to the seen source domains. In this paper, we aim to tackle the generalizable multi-source person Re-ID task (i.e., there are multiple available source domains, and the testing domain is unseen during training) from the data augmentation perspective, thus we put forward a novel method, termed MixNorm, which consists of domain-aware mix-normalization (DMN) and domain-ware center regularization (DCR). Different from the conventional data augmentation, the proposed domain-aware mix-normalization to enhance the diversity of features during training from the normalization view of the neural network, which can effectively alleviate the model overfitting to the source domains, so as to boost the generalization capability of the model in the unseen domain. To better learn the domain-invariant model, we further develop the domain-aware center regularization to better map the produced diverse features into the same space. Extensive experiments on multiple benchmark datasets validate the effectiveness of the proposed method and show that the proposed method can outperform the state-of-the-art methods. Besides, further analysis also reveals the superiority of the proposed method.
△ Less
Submitted 12 June, 2022; v1 submitted 24 January, 2022;
originally announced January 2022.
-
Learning Hierarchical Graph Representation for Image Manipulation Detection
Authors:
Wenyan Pan,
Zhili Zhou,
Miaogen Ling,
Xin Geng,
Q. M. Jonathan Wu
Abstract:
The objective of image manipulation detection is to identify and locate the manipulated regions in the images. Recent approaches mostly adopt the sophisticated Convolutional Neural Networks (CNNs) to capture the tampering artifacts left in the images to locate the manipulated regions. However, these approaches ignore the feature correlations, i.e., feature inconsistencies, between manipulated regi…
▽ More
The objective of image manipulation detection is to identify and locate the manipulated regions in the images. Recent approaches mostly adopt the sophisticated Convolutional Neural Networks (CNNs) to capture the tampering artifacts left in the images to locate the manipulated regions. However, these approaches ignore the feature correlations, i.e., feature inconsistencies, between manipulated regions and non-manipulated regions, leading to inferior detection performance. To address this issue, we propose a hierarchical Graph Convolutional Network (HGCN-Net), which consists of two parallel branches: the backbone network branch and the hierarchical graph representation learning (HGRL) branch for image manipulation detection. Specifically, the feature maps of a given image are extracted by the backbone network branch, and then the feature correlations within the feature maps are modeled as a set of fully-connected graphs for learning the hierarchical graph representation by the HGRL branch. The learned hierarchical graph representation can sufficiently capture the feature correlations across different scales, and thus it provides high discriminability for distinguishing manipulated and non-manipulated regions. Extensive experiments on four public datasets demonstrate that the proposed HGCN-Net not only provides promising detection accuracy, but also achieves strong robustness under a variety of common image attacks in the task of image manipulation detection, compared to the state-of-the-arts.
△ Less
Submitted 14 January, 2022;
originally announced January 2022.
-
Ultrahigh-energy Gamma-Ray Radiation from the Crab Pulsar Wind Nebula
Authors:
Lin Nie,
Yang Liu,
Zejun Jiang,
Xiongfei Geng
Abstract:
It has been long debated whether the high-energy gamma-ray radiation from the Crab nebula stems from leptonic or hadronic processes. In this work, we investigate the multi-band non-thermal radiation from the Crab pulsar wind nebula with the leptonic and leptonic-hadronic hybrid models, respectively. Then we use the Markov Chain Monte Carlo(MCMC) sampling technology and method of sampling trace to…
▽ More
It has been long debated whether the high-energy gamma-ray radiation from the Crab nebula stems from leptonic or hadronic processes. In this work, we investigate the multi-band non-thermal radiation from the Crab pulsar wind nebula with the leptonic and leptonic-hadronic hybrid models, respectively. Then we use the Markov Chain Monte Carlo(MCMC) sampling technology and method of sampling trace to study the stability and reasonability of the model parameters according to the recent observed results and obtain the best-fitting values of parameters. Finally, we calculate different radiative components generated by the electrons and protons in the Crab nebula. The modeling results indicate that the pure leptonic origin model with the one-zone only can partly agree with some segments of the data from various experiments (including the $\rm PeV$ gamma-ray emission reported by the LHAASO and the other radiation ranging from the radio to very high energy (VHE) gamma-ray waveband), and the contribution of hadronic interaction is hardly constrained. However, we find that the hadronic process may also contribute, especially in the energy range exceeding the $\rm PeV$. In addition, it can be inferred that the higher energy signals from the Crab nebula could be observed in the future.
△ Less
Submitted 11 January, 2022;
originally announced January 2022.
-
Constraints on sub-GeV dark matter boosted by cosmic rays from the CDEX-10 experiment at the China Jinping Underground Laboratory
Authors:
R. Xu,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
M. Agartioglu,
H. P. An,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
X. Y. Guo,
Q. J. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
H. T. Jia,
X. Jiang,
H. B. Li
, et al. (60 additional authors not shown)
Abstract:
We present new constraints on light dark matter boosted by cosmic rays (CRDM) using the 205.4 kg day data of the CDEX-10 experiment conducted at the China Jinping Underground Laboratory. The Monte Carlo simulation package CJPL\_ESS was employed to evaluate the Earth shielding effect. Several key factors have been introduced and discussed in our CRDM analysis, including the contributions from heavi…
▽ More
We present new constraints on light dark matter boosted by cosmic rays (CRDM) using the 205.4 kg day data of the CDEX-10 experiment conducted at the China Jinping Underground Laboratory. The Monte Carlo simulation package CJPL\_ESS was employed to evaluate the Earth shielding effect. Several key factors have been introduced and discussed in our CRDM analysis, including the contributions from heavier CR nuclei than proton and helium, the inhomogeneity of CR distribution, and the impact of the form factor in the Earth attenuation calculation. Our result excludes the dark matter--nucleon elastic scattering cross-section region from $1.7\times 10^{-30}$ to $10^{-26}~\rm cm^2$ for dark matter of 10 keV$/c^2$ to 1 GeV$/c^2$.
△ Less
Submitted 16 September, 2022; v1 submitted 5 January, 2022;
originally announced January 2022.