Search | arXiv e-print repository

The Z-eigenpairs of orthogonally diagonalizable symmetric tensors

Abstract: In this paper, we focus on a special class of symmetric tensors, which can be orthogonally diagonalizable, and investigate their Z-eigenpairs problem. We show that the eigenpairs can be uniformly expressed using several basic eigenpairs, and the number of all the eigenpairs is uniquely determined by the order and rank of the symmetric tensor. In addition, we exploit the local optimality of each ei… ▽ More In this paper, we focus on a special class of symmetric tensors, which can be orthogonally diagonalizable, and investigate their Z-eigenpairs problem. We show that the eigenpairs can be uniformly expressed using several basic eigenpairs, and the number of all the eigenpairs is uniquely determined by the order and rank of the symmetric tensor. In addition, we exploit the local optimality of each eigenpair by checking the second-order necessary condition. △ Less

Submitted 7 December, 2021; originally announced December 2021.

arXiv:2111.15077 [pdf, other]

Unsupervised Domain Generalization for Person Re-identification: A Domain-specific Adaptive Framework

Authors: Lei Qi, Jiaqi Liu, Lei Wang, Yinghuan Shi, Xin Geng

Abstract: Domain generalization (DG) has attracted much attention in person re-identification (ReID) recently. It aims to make a model trained on multiple source domains generalize to an unseen target domain. Although achieving promising progress, existing methods usually need the source domains to be labeled, which could be a significant burden for practical ReID tasks. In this paper, we turn to investigat… ▽ More Domain generalization (DG) has attracted much attention in person re-identification (ReID) recently. It aims to make a model trained on multiple source domains generalize to an unseen target domain. Although achieving promising progress, existing methods usually need the source domains to be labeled, which could be a significant burden for practical ReID tasks. In this paper, we turn to investigate unsupervised domain generalization for ReID, by assuming that no label is available for any source domains. To address this challenging setting, we propose a simple and efficient domain-specific adaptive framework, and realize it with an adaptive normalization module designed upon the batch and instance normalization techniques. In doing so, we successfully yield reliable pseudo-labels to implement training and also enhance the domain generalization capability of the model as required. In addition, we show that our framework can even be applied to improve person ReID under the settings of supervised domain generalization and unsupervised domain adaptation, demonstrating competitive performance with respect to relevant methods. Extensive experimental study on benchmark datasets is conducted to validate the proposed framework. A significance of our work lies in that it shows the potential of unsupervised domain generalization for person ReID and sets a strong baseline for the further research on this topic. △ Less

Submitted 23 March, 2023; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: Accepted to Pattern Recognition (PR)

arXiv:2111.11243 [pdf, other]

doi 10.1103/PhysRevD.105.052005

Studies of the Earth shielding effect to direct dark matter searches at the China Jinping Underground Laboratory

Authors: Z. Z. Liu, L. T. Yang, Q. Yue, C. H. Yeh, K. J. Kang, Y. J. Li, M. Agartioglu, H. P. An, J. P. Chang, J. H. Chen, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, X. Y. Guo, Q. J. Guo, L. He, S. M. He, J. W. Hu, H. X. Huang, T. C. Huang, H. T. Jia , et al. (58 additional authors not shown)

Abstract: Dark matter direct detection experiments mostly operate at deep underground laboratories. It is necessary to consider shielding effect of the Earth, especially for dark matter particles interacting with a large cross section. We analyzed and simulated the Earth shielding effect for dark matter at the China Jinping Underground Laboratory (CJPL) with a simulation package, CJPL Earth Shielding Simula… ▽ More Dark matter direct detection experiments mostly operate at deep underground laboratories. It is necessary to consider shielding effect of the Earth, especially for dark matter particles interacting with a large cross section. We analyzed and simulated the Earth shielding effect for dark matter at the China Jinping Underground Laboratory (CJPL) with a simulation package, CJPL Earth Shielding Simulation code (CJPL\_ESS), which is applicable to other underground locations. The further constraints on the $χ$-N cross section exclusion regions are derived based on the studies with CDEX experiment data. △ Less

Submitted 9 March, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: 8 pages, 8 figures, 2 tables. Version updated to match PRD version

Journal ref: Phys. Rev. D 105, 052005 (2022)

arXiv:2111.11029 [pdf, other]

Auto-Encoding Score Distribution Regression for Action Quality Assessment

Authors: Boyu Zhang, Jiayuan Chen, Yinfei Xu, Hui Zhang, Xu Yang, Xin Geng

Abstract: The action quality assessment (AQA) of videos is a challenging vision task since the relation between videos and action scores is difficult to model. Thus, AQA has been widely studied in the literature. Traditionally, AQA is treated as a regression problem to learn the underlying mappings between videos and action scores. But previous methods ignored data uncertainty in AQA dataset. To address ale… ▽ More The action quality assessment (AQA) of videos is a challenging vision task since the relation between videos and action scores is difficult to model. Thus, AQA has been widely studied in the literature. Traditionally, AQA is treated as a regression problem to learn the underlying mappings between videos and action scores. But previous methods ignored data uncertainty in AQA dataset. To address aleatoric uncertainty, we further develop a plug-and-play module Distribution Auto-Encoder (DAE). Specifically, it encodes videos into distributions and uses the reparameterization trick in variational auto-encoders (VAE) to sample scores, which establishes a more accurate mapping between videos and scores. Meanwhile, a likelihood loss is used to learn the uncertainty parameters. We plug our DAE approach into MUSDL and CoRe. Experimental results on public datasets demonstrate that our method achieves state-of-the-art on AQA-7, MTL-AQA, and JIGSAWS datasets. Our code is available at https://github.com/InfoX-SEU/DAE-AQA. △ Less

Submitted 31 August, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

arXiv:2110.12911 [pdf, other]

Instance-Dependent Partial Label Learning

Authors: Ning Xu, Congyu Qiao, Xin Geng, Min-Ling Zhang

Abstract: Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels. However, this assumption is not realistic since the candidate labels are always instanc… ▽ More Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels. However, this assumption is not realistic since the candidate labels are always instance-dependent. In this paper, we consider instance-dependent PLL and assume that each example is associated with a latent label distribution constituted by the real number of each label, representing the degree to each label describing the feature. The incorrect label with a high degree is more likely to be annotated as the candidate label. Therefore, the latent label distribution is the essential labeling information in partially labeled examples and worth being leveraged for predictive model training. Motivated by this consideration, we propose a novel PLL method that recovers the label distribution as a label enhancement (LE) process and trains the predictive model iteratively in every epoch. Specifically, we assume the true posterior density of the latent label distribution takes on the variational approximate Dirichlet density parameterized by an inference model. Then the evidence lower bound is deduced for optimizing the inference model and the label distributions generated from the variational posterior are utilized for training the predictive model. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed method. Source code is available at https://github.com/palm-ml/valen. △ Less

Submitted 25 October, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: NeurIPS 2021 Spotlight

arXiv:2110.08515 [pdf, other]

Multimodal Dialogue Response Generation

Authors: Qingfeng Sun, Yujing Wang, Can Xu, Kai Zheng, Yaming Yang, Huang Hu, Fei Xu, Jessica Zhang, Xiubo Geng, Daxin Jiang

Abstract: Responsing with image has been recognized as an important capability for an intelligent conversational agent. Yet existing works only focus on exploring the multimodal dialogue models which depend on retrieval-based methods, but neglecting generation methods. To fill in the gaps, we first present a multimodal dialogue generation model, which takes the dialogue history as input, then generates a te… ▽ More Responsing with image has been recognized as an important capability for an intelligent conversational agent. Yet existing works only focus on exploring the multimodal dialogue models which depend on retrieval-based methods, but neglecting generation methods. To fill in the gaps, we first present a multimodal dialogue generation model, which takes the dialogue history as input, then generates a textual sequence or an image as response. Learning such a model often requires multimodal dialogues containing both texts and images which are difficult to obtain. Motivated by the challenge in practice, we consider multimodal dialogue generation under a natural assumption that only limited training examples are available. In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation model. By this means, the major part of the model can be learned from a large number of text-only dialogues and text-image pairs respectively, then the whole parameters can be well fitted using the limited training examples. Extensive experiments demonstrate our method achieves state-of-the-art results in both automatic and human evaluation, and can generate informative text and high-resolution image responses. △ Less

Submitted 29 March, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

Comments: Accepted to ACL 2022 Main Conference

arXiv:2110.06533 [pdf, other]

EventBERT: A Pre-Trained Model for Event Correlation Reasoning

Authors: Yucheng Zhou, Xiubo Geng, Tao Shen, Guodong Long, Daxin Jiang

Abstract: Event correlation reasoning infers whether a natural language paragraph containing multiple events conforms to human common sense. For example, "Andrew was very drowsy, so he took a long nap, and now he is very alert" is sound and reasonable. In contrast, "Andrew was very drowsy, so he stayed up a long time, now he is very alert" does not comply with human common sense. Such reasoning capability i… ▽ More Event correlation reasoning infers whether a natural language paragraph containing multiple events conforms to human common sense. For example, "Andrew was very drowsy, so he took a long nap, and now he is very alert" is sound and reasonable. In contrast, "Andrew was very drowsy, so he stayed up a long time, now he is very alert" does not comply with human common sense. Such reasoning capability is essential for many downstream tasks, such as script reasoning, abductive reasoning, narrative incoherence, story cloze test, etc. However, conducting event correlation reasoning is challenging due to a lack of large amounts of diverse event-based knowledge and difficulty in capturing correlation among multiple events. In this paper, we propose EventBERT, a pre-trained model to encapsulate eventuality knowledge from unlabeled text. Specifically, we collect a large volume of training examples by identifying natural language paragraphs that describe multiple correlated events and further extracting event spans in an unsupervised manner. We then propose three novel event- and correlation-based learning objectives to pre-train an event correlation model on our created training corpus. Empirical results show EventBERT outperforms strong baselines on four downstream tasks, and achieves SoTA results on most of them. Besides, it outperforms existing pre-trained models by a large margin, e.g., 6.5~23%, in zero-shot learning of these tasks. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: 12 pages, 6 figures

arXiv:2110.00159 [pdf, other]

Building an Efficient and Effective Retrieval-based Dialogue System via Mutual Learning

Authors: Chongyang Tao, Jiazhan Feng, Chang Liu, Juntao Li, Xiubo Geng, Daxin Jiang

Abstract: Establishing retrieval-based dialogue systems that can select appropriate responses from the pre-built index has gained increasing attention from researchers. For this task, the adoption of pre-trained language models (such as BERT) has led to remarkable progress in a number of benchmarks. There exist two common approaches, including cross-encoders which perform full attention over the inputs, and… ▽ More Establishing retrieval-based dialogue systems that can select appropriate responses from the pre-built index has gained increasing attention from researchers. For this task, the adoption of pre-trained language models (such as BERT) has led to remarkable progress in a number of benchmarks. There exist two common approaches, including cross-encoders which perform full attention over the inputs, and bi-encoders that encode the context and response separately. The former gives considerable improvements in accuracy but is often inapplicable in practice for large-scale retrieval given the cost of the full attention required for each sample at test time. The latter is efficient for billions of indexes but suffers from sub-optimal performance. In this work, we propose to combine the best of both worlds to build a retrieval system. Specifically, we employ a fast bi-encoder to replace the traditional feature-based pre-retrieval model (such as BM25) and set the response re-ranking model as a more complicated architecture (such as cross-encoder). To further improve the effectiveness of our framework, we train the pre-retrieval model and the re-ranking model at the same time via mutual learning, which enables two models to learn from each other throughout the training process. We conduct experiments on two benchmarks and evaluation results demonstrate the efficiency and effectiveness of our proposed framework. △ Less

Submitted 30 September, 2021; originally announced October 2021.

Comments: 9 pages, 4 figures

arXiv:2109.12302 [pdf, other]

Learning Neural Templates for Recommender Dialogue System

Authors: Zujie Liang, Huang Hu, Can Xu, Jian Miao, Yingying He, Yining Chen, Xiubo Geng, Fan Liang, Daxin Jiang

Abstract: Though recent end-to-end neural models have shown promising progress on Conversational Recommender System (CRS), two key challenges still remain. First, the recommended items cannot be always incorporated into the generated replies precisely and appropriately. Second, only the items mentioned in the training corpus have a chance to be recommended in the conversation. To tackle these challenges, we… ▽ More Though recent end-to-end neural models have shown promising progress on Conversational Recommender System (CRS), two key challenges still remain. First, the recommended items cannot be always incorporated into the generated replies precisely and appropriately. Second, only the items mentioned in the training corpus have a chance to be recommended in the conversation. To tackle these challenges, we introduce a novel framework called NTRD for recommender dialogue system that decouples the dialogue generation from the item recommendation. NTRD has two key components, i.e., response template generator and item selector. The former adopts an encoder-decoder model to generate a response template with slot locations tied to target items, while the latter fills in slot locations with the proper items using a sufficient attention mechanism. Our approach combines the strengths of both classical slot filling approaches (that are generally controllable) and modern neural NLG approaches (that are generally more natural and accurate). Extensive experiments on the benchmark ReDial show our NTRD significantly outperforms the previous state-of-the-art methods. Besides, our approach has the unique advantage to produce novel items that do not appear in the training set of dialogue corpus. The code is available at \url{https://github.com/jokieleung/NTRD}. △ Less

Submitted 25 September, 2021; originally announced September 2021.

Comments: EMNLP 2021 long paper, code link: https://github.com/jokieleung/NTRD

arXiv:2109.12273 [pdf, ps, other]

FedProc: Prototypical Contrastive Federated Learning on Non-IID data

Authors: Xutong Mu, Yulong Shen, Ke Cheng, Xueli Geng, Jiaxuan Fu, Tao Zhang, Zhiwei Zhang

Abstract: Federated learning allows multiple clients to collaborate to train high-performance deep learning models while keeping the training data locally. However, when the local data of all clients are not independent and identically distributed (i.e., non-IID), it is challenging to implement this form of efficient collaborative learning. Although significant efforts have been dedicated to addressing this… ▽ More Federated learning allows multiple clients to collaborate to train high-performance deep learning models while keeping the training data locally. However, when the local data of all clients are not independent and identically distributed (i.e., non-IID), it is challenging to implement this form of efficient collaborative learning. Although significant efforts have been dedicated to addressing this challenge, the effect on the image classification task is still not satisfactory. In this paper, we propose FedProc: prototypical contrastive federated learning, which is a simple and effective federated learning framework. The key idea is to utilize the prototypes as global knowledge to correct the local training of each client. We design a local network architecture and a global prototypical contrastive loss to regulate the training of local models, which makes local objectives consistent with the global optima. Eventually, the converged global model obtains a good performance on non-IID data. Experimental results show that, compared to state-of-the-art federated learning methods, FedProc improves the accuracy by $1.6\%\sim7.9\%$ with acceptable computation cost. △ Less

Submitted 25 September, 2021; originally announced September 2021.

arXiv:2109.07582 [pdf, other]

Pareto-wise Ranking Classifier for Multi-objective Evolutionary Neural Architecture Search

Authors: Lianbo Ma, Nan Li, Guo Yu, Xiaoyu Geng, Min Huang, Xingwei Wang

Abstract: In the deployment of deep neural models, how to effectively and automatically find feasible deep models under diverse design objectives is fundamental. Most existing neural architecture search (NAS) methods utilize surrogates to predict the detailed performance (e.g., accuracy and model size) of a candidate architecture during the search, which however is complicated and inefficient. In contrast,… ▽ More In the deployment of deep neural models, how to effectively and automatically find feasible deep models under diverse design objectives is fundamental. Most existing neural architecture search (NAS) methods utilize surrogates to predict the detailed performance (e.g., accuracy and model size) of a candidate architecture during the search, which however is complicated and inefficient. In contrast, we aim to learn an efficient Pareto classifier to simplify the search process of NAS by transforming the complex multi-objective NAS task into a simple Pareto-dominance classification task. To this end, we propose a classification-wise Pareto evolution approach for one-shot NAS, where an online classifier is trained to predict the dominance relationship between the candidate and constructed reference architectures, instead of using surrogates to fit the objective functions. The main contribution of this study is to change supernet adaption into a Pareto classifier. Besides, we design two adaptive schemes to select the reference set of architectures for constructing classification boundary and regulate the rate of positive samples over negative ones, respectively. We compare the proposed evolution approach with state-of-the-art approaches on widely-used benchmark datasets, and experimental results indicate that the proposed approach outperforms other approaches and have found a number of neural architectures with different model sizes ranging from 2M to 6M under diverse objectives and constraints. △ Less

Submitted 8 March, 2024; v1 submitted 14 September, 2021; originally announced September 2021.

arXiv:2108.13405 [pdf, other]

Stochastic Uncertainty Propagation in Power System Dynamics using Measure-valued Proximal Recursions

Authors: Abhishek Halder, Kenneth F. Caluya, Pegah Ojaghi, Xinbo Geng

Abstract: We present a proximal algorithm that performs a variational recursion on the space of joint probability measures to propagate the stochastic uncertainties in power system dynamics over high dimensional state space. The proposed algorithm takes advantage of the exact nonlinearity structures in the trajectory-level dynamics of the networked power systems, and is nonparametric. Lifting the dynamics t… ▽ More We present a proximal algorithm that performs a variational recursion on the space of joint probability measures to propagate the stochastic uncertainties in power system dynamics over high dimensional state space. The proposed algorithm takes advantage of the exact nonlinearity structures in the trajectory-level dynamics of the networked power systems, and is nonparametric. Lifting the dynamics to the space of probability measures allows us to design a scalable algorithm that obviates gridding the underlying high dimensional state space which is computationally prohibitive. The proximal recursion implements a generalized infinite dimensional gradient flow, and evolves probability-weighted scattered point clouds. We clarify the theoretical nuances and algorithmic details specific to the power system nonlinearities, and provide illustrative numerical examples. △ Less

Submitted 24 August, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

arXiv:2107.06882 [pdf, other]

Conservative Objective Models for Effective Offline Model-Based Optimization

Authors: Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine

Abstract: Computational design problems arise in a number of settings, from synthetic biology to computer architectures. In this paper, we aim to solve data-driven model-based optimization (MBO) problems, where the goal is to find a design input that maximizes an unknown objective function provided access to only a static dataset of prior experiments. Such data-driven optimization procedures are the only pr… ▽ More Computational design problems arise in a number of settings, from synthetic biology to computer architectures. In this paper, we aim to solve data-driven model-based optimization (MBO) problems, where the goal is to find a design input that maximizes an unknown objective function provided access to only a static dataset of prior experiments. Such data-driven optimization procedures are the only practical methods in many real-world domains where active data collection is expensive (e.g., when optimizing over proteins) or dangerous (e.g., when optimizing over aircraft designs). Typical methods for MBO that optimize the design against a learned model suffer from distributional shift: it is easy to find a design that "fools" the model into predicting a high value. To overcome this, we propose conservative objective models (COMs), a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs, and uses it for optimization. Structurally, COMs resemble adversarial training methods used to overcome adversarial examples. COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems, including optimizing protein sequences, robot morphologies, neural network weights, and superconducting materials. △ Less

Submitted 14 July, 2021; originally announced July 2021.

Comments: ICML 2021. First two authors contributed equally. Code at: https://github.com/brandontrabucco/design-baselines/blob/c65a53fe1e6567b740f0adf60c5db9921c1f2330/design_baselines/coms_cleaned/__init__.py

arXiv:2107.01189

NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Authors: Jerrick Liu, Nathan Inkawhich, Oliver Nina, Radu Timofte, Sahil Jain, Bob Lee, Yuru Duan, Wei Wei, Lei Zhang, Songzheng Xu, Yuxuan Sun, Jiaqi Tang, Xueli Geng, Mengru Ma, Gongzhe Li, Xueli Geng, Huanqia Cai, Chengxue Cai, Sol Cummings, Casian Miron, Alexandru Pasarica, Cheng-Yen Yang, Hung-Min Hsu, Jiarui Cai, Jie Mei , et al. (9 additional authors not shown)

Abstract: In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in compl… ▽ More In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in complementary ways. We discuss the top methods submitted for this competition and evaluate their results on our blind test set. Our challenge results show significant improvement of more than 15% accuracy from our current baselines for each track of the competition △ Less

Submitted 6 April, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: The paper needs to be withdrawn since it did not properly go through the public release process. We will soon release a new version to replace this one

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, 588-595

arXiv:2106.15802 [pdf, other]

CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban Computing

Authors: Zhengfei Zheng, Xu Geng, Hai Yang

Abstract: Data-driven approaches have emerged as a popular tool for addressing challenges in urban computing. However, current research efforts have primarily focused on limited data sources, which fail to capture the complexity of urban data arising from multiple entities and their interconnections. Therefore, a comprehensive and multifaceted dataset is required to enable more extensive studies in urban co… ▽ More Data-driven approaches have emerged as a popular tool for addressing challenges in urban computing. However, current research efforts have primarily focused on limited data sources, which fail to capture the complexity of urban data arising from multiple entities and their interconnections. Therefore, a comprehensive and multifaceted dataset is required to enable more extensive studies in urban computing. In this paper, we present CityNet, a multi-modal urban dataset that incorporates various data, including taxi trajectory, traffic speed, point of interest (POI), road network, wind, rain, temperature, and more, from seven cities. We categorize this comprehensive data into three streams: mobility data, geographical data, and meteorological data. We begin by detailing the generation process and basic properties of CityNet. Additionally, we conduct extensive data mining and machine learning experiments, including spatio-temporal predictions, transfer learning, and reinforcement learning, to facilitate the use of CityNet. Our experimental results provide benchmarks for various tasks and methods, and also reveal internal correlations among cities and tasks within CityNet that can be leveraged to improve spatiotemporal forecasting performance. Based on our benchmarking results and the correlations uncovered, we believe that CityNet can significantly contribute to the field of urban computing by enabling research on advanced topics. △ Less

Submitted 10 April, 2024; v1 submitted 30 June, 2021; originally announced June 2021.

arXiv:2106.06788 [pdf, other]

Learngene: From Open-World to Your Learning Task

Authors: Qiufeng Wang, Xin Geng, Shuxia Lin, Shiyu Xia, Lei Qi, Ning Xu

Abstract: Although deep learning has made significant progress on fixed large-scale datasets, it typically encounters challenges regarding improperly detecting unknown/unseen classes in the open-world scenario, over-parametrized, and overfitting small samples. Since biological systems can overcome the above difficulties very well, individuals inherit an innate gene from collective creatures that have evolve… ▽ More Although deep learning has made significant progress on fixed large-scale datasets, it typically encounters challenges regarding improperly detecting unknown/unseen classes in the open-world scenario, over-parametrized, and overfitting small samples. Since biological systems can overcome the above difficulties very well, individuals inherit an innate gene from collective creatures that have evolved over hundreds of millions of years and then learn new skills through few examples. Inspired by this, we propose a practical collective-individual paradigm where an evolution (expandable) network is trained on sequential tasks and then recognize unknown classes in real-world. Moreover, the learngene, i.e., the gene for learning initialization rules of the target model, is proposed to inherit the meta-knowledge from the collective model and reconstruct a lightweight individual model on the target task. Particularly, a novel criterion is proposed to discover learngene in the collective model, according to the gradient information. Finally, the individual model is trained only with few samples on the target learning tasks. We demonstrate the effectiveness of our approach in an extensive empirical study and theoretical analysis. △ Less

Submitted 17 June, 2022; v1 submitted 12 June, 2021; originally announced June 2021.

Comments: To be appeared in AAAI-22

arXiv:2106.06152 [pdf, other]

On the Robustness of Average Losses for Partial-Label Learning

Authors: Jiaqi Lv, Biao Liu, Lei Feng, Ning Xu, Miao Xu, Bo An, Gang Niu, Xin Geng, Masashi Sugiyama

Abstract: Partial-label learning (PLL) utilizes instances with PLs, where a PL includes several candidate labels but only one is the true label (TL). In PLL, identification-based strategy (IBS) purifies each PL on the fly to select the (most likely) TL for training; average-based strategy (ABS) treats all candidate labels equally for training and let trained models be able to predict TL. Although PLL resear… ▽ More Partial-label learning (PLL) utilizes instances with PLs, where a PL includes several candidate labels but only one is the true label (TL). In PLL, identification-based strategy (IBS) purifies each PL on the fly to select the (most likely) TL for training; average-based strategy (ABS) treats all candidate labels equally for training and let trained models be able to predict TL. Although PLL research has focused on IBS for better performance, ABS is also worthy of study since modern IBS behaves like ABS in the beginning of training to prepare for PL purification and TL selection. In this paper, we analyze why ABS was unsatisfactory and propose how to improve it. Theoretically, we formalize five problem settings of PLL and prove that average PL losses (APLLs) with bounded multi-class losses are always robust, while APLLs with unbounded losses may be non-robust, which is the first robustness analysis for PLL. Experimentally, we have two promising findings: ABS using bounded losses can match/exceed state-of-the-art performance of IBS using unbounded losses; after using robust APLLs to warm start, IBS can further improve upon itself. Our work draws attention to ABS research, which can in turn boost IBS and push forward the whole PLL. △ Less

Submitted 24 November, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

arXiv:2106.01541 [pdf, other]

MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding

Authors: Jia-Chen Gu, Chongyang Tao, Zhen-Hua Ling, Can Xu, Xiubo Geng, Daxin Jiang

Abstract: Recently, various neural models for multi-party conversation (MPC) have achieved impressive improvements on a variety of tasks such as addressee recognition, speaker identification and response prediction. However, these existing methods on MPC usually represent interlocutors and utterances individually and ignore the inherent complicated structure in MPC which may provide crucial interlocutor and… ▽ More Recently, various neural models for multi-party conversation (MPC) have achieved impressive improvements on a variety of tasks such as addressee recognition, speaker identification and response prediction. However, these existing methods on MPC usually represent interlocutors and utterances individually and ignore the inherent complicated structure in MPC which may provide crucial interlocutor and utterance semantics and would enhance the conversation understanding process. To this end, we present MPC-BERT, a pre-trained model for MPC understanding that considers learning who says what to whom in a unified model with several elaborated self-supervised tasks. Particularly, these tasks can be generally categorized into (1) interlocutor structure modeling including reply-to utterance recognition, identical speaker searching and pointer consistency distinction, and (2) utterance semantics modeling including masked shared utterance restoration and shared node detection. We evaluate MPC-BERT on three downstream tasks including addressee recognition, speaker identification and response selection. Experimental results show that MPC-BERT outperforms previous methods by large margins and achieves new state-of-the-art performance on all three downstream tasks at two benchmarks. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: Accepted by ACL 2021

arXiv:2105.13073 [pdf, other]

Maria: A Visual Experience Powered Conversational Agent

Authors: Zujie Liang, Huang Hu, Can Xu, Chongyang Tao, Xiubo Geng, Yining Chen, Fan Liang, Daxin Jiang

Abstract: Arguably, the visual perception of conversational agents to the physical world is a key way for them to exhibit the human-like intelligence. Image-grounded conversation is thus proposed to address this challenge. Existing works focus on exploring the multimodal dialog models that ground the conversation on a given image. In this paper, we take a step further to study image-grounded conversation un… ▽ More Arguably, the visual perception of conversational agents to the physical world is a key way for them to exhibit the human-like intelligence. Image-grounded conversation is thus proposed to address this challenge. Existing works focus on exploring the multimodal dialog models that ground the conversation on a given image. In this paper, we take a step further to study image-grounded conversation under a fully open-ended setting where no paired dialog and image are assumed available. Specifically, we present Maria, a neural conversation agent powered by the visual world experiences which are retrieved from a large-scale image index. Maria consists of three flexible components, i.e., text-to-image retriever, visual concept detector and visual-knowledge-grounded response generator. The retriever aims to retrieve a correlated image to the dialog from an image index, while the visual concept detector extracts rich visual knowledge from the image. Then, the response generator is grounded on the extracted visual knowledge and dialog context to generate the target response. Extensive experiments demonstrate Maria outperforms previous state-of-the-art methods on automatic metrics and human evaluation, and can generate informative responses that have some visual commonsense of the physical world. △ Less

Submitted 23 June, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

Comments: Accepted by ACL 2021 main conference

arXiv:2105.11631 [pdf, other]

doi 10.1364/OPTICA.447557

Optical trapping of nanoparticles in superfluid helium

Authors: Yosuke Minowa, Xi Geng, Keisuke Kokado, Kentaro Sato, Tatsuya Kameyama, Tsukasa Torimoto, Masaaki Ashida

Abstract: Optical tweezers, the three-dimensional confinement of a nanoparticle by a strongly focused beam of light, have been widely employed in investigating biomaterial nanomechanics, nanoscopic fluid properties, and ultrasensitive detections in various environments such as inside living cells, at gigapascal pressure, and under high vacuum. However, the cryogenic operation of solid-state-particle optical… ▽ More Optical tweezers, the three-dimensional confinement of a nanoparticle by a strongly focused beam of light, have been widely employed in investigating biomaterial nanomechanics, nanoscopic fluid properties, and ultrasensitive detections in various environments such as inside living cells, at gigapascal pressure, and under high vacuum. However, the cryogenic operation of solid-state-particle optical tweezers is poorly understood. In this study, we demonstrate the optical trapping of metallic and dielectric nanoparticles in superfluid helium below 2 K, which is two orders of magnitude lower than in the previous experiments. We prepare the nanoparticles via in-situ laser ablation. The nanoparticles are stably trapped with a single laser beam tightly focused in the superfluid helium. Our method provides a new approach for studying nanoscopic quantum hydrodynamic effects and interactions between quantum fluids and classical objects. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Journal ref: Optica 9, 139-144 (2022)

arXiv:2105.07149 [pdf, other]

DirectQE: Direct Pretraining for Machine Translation Quality Estimation

Authors: Qu Cui, Shujian Huang, Jiahuan Li, Xiang Geng, Zaixiang Zheng, Guoping Huang, Jiajun Chen

Abstract: Machine Translation Quality Estimation (QE) is a task of predicting the quality of machine translations without relying on any reference. Recently, the predictor-estimator framework trains the predictor as a feature extractor, which leverages the extra parallel corpora without QE labels, achieving promising QE performance. However, we argue that there are gaps between the predictor and the estimat… ▽ More Machine Translation Quality Estimation (QE) is a task of predicting the quality of machine translations without relying on any reference. Recently, the predictor-estimator framework trains the predictor as a feature extractor, which leverages the extra parallel corpora without QE labels, achieving promising QE performance. However, we argue that there are gaps between the predictor and the estimator in both data quality and training objectives, which preclude QE models from benefiting from a large number of parallel corpora more directly. We propose a novel framework called DirectQE that provides a direct pretraining for QE tasks. In DirectQE, a generator is trained to produce pseudo data that is closer to the real QE data, and a detector is pretrained on these data with novel objectives that are akin to the QE task. Experiments on widely used benchmarks show that DirectQE outperforms existing methods, without using any pretraining models such as BERT. We also give extensive analyses showing how fixing the two gaps contributes to our improvements. △ Less

Submitted 15 May, 2021; originally announced May 2021.

arXiv:2104.13597 [pdf, other]

doi 10.1088/1748-0221/16/09/T09005

SAGE : A Monte Carlo Simulation Framework for Experiments with Germanium Detectors

Authors: Ze She, Hao Ma, Weihe Zeng, Wenhan Dai, Xinping Geng, Ofoq Normahmedov, Jingzhe Yang, Zhi Zeng, Qian Yue, Jianping Cheng, Junli Li

Abstract: A Geant4-based simulation framework for rare event searching experiments with germanium detectors named SAGE is presented with details. It is designed for simulating, assessing background distribution, and investigating the response of the germanium detectors. The SAGE framework incorporates its experiment-specific geometries and custom attributes, including the event generators, physics lists and… ▽ More A Geant4-based simulation framework for rare event searching experiments with germanium detectors named SAGE is presented with details. It is designed for simulating, assessing background distribution, and investigating the response of the germanium detectors. The SAGE framework incorporates its experiment-specific geometries and custom attributes, including the event generators, physics lists and output format, to satisfy various simulation objectives. Its docker image has been prepared for virtualizing and distributing the SAGE framework. Deployment of a Geant4-based simulation will be convenient under this docker image. The implemented geometries include p-type point contact and broad energy germanium detectors with environmental surroundings, and these hierarchical geometries can be easily extended. Users select these custom attributes via the JSON configuration files. The aforementioned attributes satisfy the simulation demands and make SAGE a generic and powerful simulation framework for the CDEX experiment. △ Less

Submitted 27 September, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: 12 pages, 5 figures

arXiv:2104.02570 [pdf, other]

Learning from Noisy Labels via Dynamic Loss Thresholding

Authors: Hao Yang, Youzhi Jin, Ziyin Li, Deng-Bao Wang, Lei Miao, Xin Geng, Min-Ling Zhang

Abstract: Numerous researches have proved that deep neural networks (DNNs) can fit everything in the end even given data with noisy labels, and result in poor generalization performance. However, recent studies suggest that DNNs tend to gradually memorize the data, moving from correct data to mislabeled data. Inspired by this finding, we propose a novel method named Dynamic Loss Thresholding (DLT). During t… ▽ More Numerous researches have proved that deep neural networks (DNNs) can fit everything in the end even given data with noisy labels, and result in poor generalization performance. However, recent studies suggest that DNNs tend to gradually memorize the data, moving from correct data to mislabeled data. Inspired by this finding, we propose a novel method named Dynamic Loss Thresholding (DLT). During the training process, DLT records the loss value of each sample and calculates dynamic loss thresholds. Specifically, DLT compares the loss value of each sample with the current loss threshold. Samples with smaller losses can be considered as clean samples with higher probability and vice versa. Then, DLT discards the potentially corrupted labels and further leverages supervised learning techniques. Experiments on CIFAR-10/100 and Clothing1M demonstrate substantial improvements over recent state-of-the-art methods. In addition, we investigate two real-world problems for the first time. Firstly, we propose a novel approach to estimate the noise rates of datasets based on the loss difference between the early and late training stages of DNNs. Secondly, we explore the effect of hard samples (which are difficult to be distinguished) on the process of learning from noisy labels. △ Less

Submitted 1 April, 2021; originally announced April 2021.

arXiv:2103.16424 [pdf, other]

Two-stage Robust Energy Storage Planning with Probabilistic Guarantees: A Data-driven Approach

Authors: Chao Yan, Xinbo Geng, Zhaohong Bie, Le Xie

Abstract: This paper addresses a central challenge of jointly considering shorter-term (e.g. hourly) and longer-term (e.g. yearly) uncertainties in power system planning with increasing penetration of renewable and storage resources. In conventional planning decision making, shorter-term (e.g., hourly) variations are not explicitly accounted for. However, given the deepening penetration of variable resource… ▽ More This paper addresses a central challenge of jointly considering shorter-term (e.g. hourly) and longer-term (e.g. yearly) uncertainties in power system planning with increasing penetration of renewable and storage resources. In conventional planning decision making, shorter-term (e.g., hourly) variations are not explicitly accounted for. However, given the deepening penetration of variable resources, it is becoming imperative to consider such shorter-term variation in the longer-term planning exercise. By leveraging the abundant amount of operational observation data, we propose a scenario-based robust planning framework that provides rigorous guarantees on the future operation risk of planning decisions considering a broad range of operational conditions, such as renewable generation fluctuations and load variations. By connecting two-stage robust optimization with the scenario approach theory, we show that with a carefully chosen number of scenarios, the operational risk level of the robust solution can be adaptive to the risk preference set by planners. The theoretical guarantees hold true for any distributions, and the proposed approach is scalable towards real-world power grids. Furthermore, the column-and-constraint generation algorithm is used to solve the two-stage robust planning problem and tighten theoretical guarantees. We substantiate this framework through a planning problem of energy storage in a power grid with deep renewable penetration. Case studies are performed on large-scale test systems (modified IEEE 118-bus system) to illustrate the theoretical bounds as well as the scalability of proposed algorithm. △ Less

Submitted 10 September, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

arXiv:2102.09026 [pdf, other]

Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm

Authors: Bin Gu, Guodong Liu, Yanfu Zhang, Xiang Geng, Heng Huang

Abstract: Modern machine learning algorithms usually involve tuning multiple (from one to thousands) hyperparameters which play a pivotal role in terms of model generalizability. Black-box optimization and gradient-based algorithms are two dominant approaches to hyperparameter optimization while they have totally distinct advantages. How to design a new hyperparameter optimization technique inheriting all b… ▽ More Modern machine learning algorithms usually involve tuning multiple (from one to thousands) hyperparameters which play a pivotal role in terms of model generalizability. Black-box optimization and gradient-based algorithms are two dominant approaches to hyperparameter optimization while they have totally distinct advantages. How to design a new hyperparameter optimization technique inheriting all benefits from both approaches is still an open problem. To address this challenging problem, in this paper, we propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG). Specifically, we first exactly formulate hyperparameter optimization as an A-based constrained optimization problem, where A is a black-box optimization algorithm (such as deep neural network). Then, we use the average zeroth-order hyper-gradients to update hyperparameters. We provide the feasibility analysis of using HOZOG to achieve hyperparameter optimization. Finally, the experimental results on three representative hyperparameter (the size is from 1 to 1250) optimization tasks demonstrate the benefits of HOZOG in terms of simplicity, scalability, flexibility, effectiveness and efficiency compared with the state-of-the-art hyperparameter optimization methods. △ Less

Submitted 17 February, 2021; originally announced February 2021.

arXiv:2101.02836 [pdf]

doi 10.1049/cit2.12135

Deep Learning Framework for Multi-Round Service Bundle Recommendation in Iterative Mashup Development

Authors: Yutao Ma, Xiao Geng, Jian Wang, Keqing He, Dionysis Athanasopoulos

Abstract: Recent years have witnessed the rapid development of service-oriented computing technologies. The boom of Web services increases software developers' selection burden in developing new service-based systems such as mashups. Timely recommending appropriate component services for developers to build new mashups has become a fundamental problem in service-oriented software engineering. Existing servi… ▽ More Recent years have witnessed the rapid development of service-oriented computing technologies. The boom of Web services increases software developers' selection burden in developing new service-based systems such as mashups. Timely recommending appropriate component services for developers to build new mashups has become a fundamental problem in service-oriented software engineering. Existing service recommendation approaches are mainly designed for mashup development in the single-round scenario. It is hard for them to effectively update recommendation results according to developers' requirements and behaviours (e.g. instant service selection). To address this issue, the authors propose a service bundle recommendation framework based on deep learning, DLISR, which aims to capture the interactions among the target mashup to build, selected (component) services, and the following service to recommend. Moreover, an attention mechanism is employed in DLISR to weigh selected services when recommending a candidate service. The authors also design two separate models for learning interactions from the perspectives of content and invocation history, respectively, and a hybrid model called HISR. Experiments on a real-world dataset indicate that HISR can outperform several state-of-the-art service recommendation methods to develop new mashups iteratively. △ Less

Submitted 6 September, 2022; v1 submitted 7 January, 2021; originally announced January 2021.

Comments: 15 pages, 6 figures, and 3 tables

ACM Class: D.2.10

Journal ref: CAAI Transactions on Intelligence Technology, 2022

arXiv:2012.14624 [pdf, other]

Deferrable Load Scheduling under Demand Charge: A Block Model-Predictive Control Approach

Authors: Lei Yang, Xinbo Geng, Xiaohong Guan, Lang Tong

Abstract: Optimal scheduling of deferrable electrical loads can reshape the aggregated load profile to achieve higher operational efficiency and reliability. This paper studies deferrable load scheduling under demand charge that imposes a penalty on the peak consumption over a billing period. Such a terminal cost poses challenges in real-time dispatch when demand forecasts are inaccurate. A block model-pred… ▽ More Optimal scheduling of deferrable electrical loads can reshape the aggregated load profile to achieve higher operational efficiency and reliability. This paper studies deferrable load scheduling under demand charge that imposes a penalty on the peak consumption over a billing period. Such a terminal cost poses challenges in real-time dispatch when demand forecasts are inaccurate. A block model-predictive control approach is proposed by breaking demand charge into a sequence of stage costs. The problem of charging electric vehicles is used to illustrate the efficacy of the proposed approach. Numerical examples show that the block model-predictive control outperforms benchmark methods in various settings. △ Less

Submitted 11 January, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

Comments: 10 pages, 4 plots

arXiv:2012.07769 [pdf, other]

Variable-Shot Adaptation for Online Meta-Learning

Authors: Tianhe Yu, Xinyang Geng, Chelsea Finn, Sergey Levine

Abstract: Few-shot meta-learning methods consider the problem of learning new tasks from a small, fixed number of examples, by meta-learning across static data from a set of previous tasks. However, in many real world settings, it is more natural to view the problem as one of minimizing the total amount of supervision --- both the number of examples needed to learn a new task and the amount of data needed f… ▽ More Few-shot meta-learning methods consider the problem of learning new tasks from a small, fixed number of examples, by meta-learning across static data from a set of previous tasks. However, in many real world settings, it is more natural to view the problem as one of minimizing the total amount of supervision --- both the number of examples needed to learn a new task and the amount of data needed for meta-learning. Such a formulation can be studied in a sequential learning setting, where tasks are presented in sequence. When studying meta-learning in this online setting, a critical question arises: can meta-learning improve over the sample complexity and regret of standard empirical risk minimization methods, when considering both meta-training and adaptation together? The answer is particularly non-obvious for meta-learning algorithms with complex bi-level optimizations that may demand large amounts of meta-training data. To answer this question, we extend previous meta-learning algorithms to handle the variable-shot settings that naturally arise in sequential learning: from many-shot learning at the start, to zero-shot learning towards the end. On sequential learning problems, we find that meta-learning solves the full task set with fewer overall labels and achieves greater cumulative performance, compared to standard supervised methods. These results suggest that meta-learning is an important ingredient for building learning systems that continuously learn and improve over a sequence of problems. △ Less

Submitted 14 December, 2020; originally announced December 2020.

Comments: First two authors contribute equally

arXiv:2012.03502 [pdf, other]

Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

Authors: Xiachong Feng, Xiaocheng Feng, Bing Qin, Xinwei Geng

Abstract: Meeting summarization is a challenging task due to its dynamic interaction nature among multiple speakers and lack of sufficient training data. Existing methods view the meeting as a linear sequence of utterances while ignoring the diverse relations between each utterance. Besides, the limited labeled data further hinders the ability of data-hungry neural models. In this paper, we try to mitigate… ▽ More Meeting summarization is a challenging task due to its dynamic interaction nature among multiple speakers and lack of sufficient training data. Existing methods view the meeting as a linear sequence of utterances while ignoring the diverse relations between each utterance. Besides, the limited labeled data further hinders the ability of data-hungry neural models. In this paper, we try to mitigate the above challenges by introducing dialogue-discourse relations. First, we present a Dialogue Discourse-Dware Meeting Summarizer (DDAMS) to explicitly model the interaction between utterances in a meeting by modeling different discourse relations. The core module is a relational graph encoder, where the utterances and discourse relations are modeled in a graph interaction manner. Moreover, we devise a Dialogue Discourse-Aware Data Augmentation (DDADA) strategy to construct a pseudo-summarization corpus from existing input meetings, which is 20 times larger than the original dataset and can be used to pretrain DDAMS. Experimental results on AMI and ICSI meeting datasets show that our full system can achieve SOTA performance. Our codes will be available at: https://github.com/xcfcode/DDAMS. △ Less

Submitted 19 May, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: IJCAI 2021

arXiv:2011.05193 [pdf, other]

Probabilistic Hosting Capacity Analysis via Bayesian Optimization

Authors: Xinbo Geng, Lang Tong, Anirban Bhattacharya, Bani Mallick, Le Xie

Abstract: This paper studies the probabilistic hosting capacity analysis (PHCA) problem in distribution networks considering uncertainties from distributed energy resources (DERs) and residential loads. PHCA aims to compute the hosting capacity, which is defined as the maximal level of DERs that can be securely integrated into a distribution network while satisfying operational constraints with high probabi… ▽ More This paper studies the probabilistic hosting capacity analysis (PHCA) problem in distribution networks considering uncertainties from distributed energy resources (DERs) and residential loads. PHCA aims to compute the hosting capacity, which is defined as the maximal level of DERs that can be securely integrated into a distribution network while satisfying operational constraints with high probability. We formulate PHCA as a chance-constrained optimization problem, and model the uncertainties from DERs and loads using historical data. Due to non-convexities and a substantial number of historical scenarios being used, PHCA is often formulated as large-scale nonlinear optimization problem, thus computationally intractable to solve. To address the core computational challenges, we propose a fast and extensible framework to solve PHCA based on Bayesian Optimization (BayesOpt). Comparing with state-of-the-art algorithms such as interior point and active set, numerical results show that the proposed BayesOpt approach is able to find better solutions (25% higher hosting capacity) with 70% savings in computation time on average. △ Less

Submitted 10 November, 2020; originally announced November 2020.

arXiv:2011.04852 [pdf, ps, other]

doi 10.1016/j.spl.2021.109264

When Is the Conway-Maxwell-Poisson Distribution Infinitely Divisible?

Authors: Xi Geng, Aihua Xia

Abstract: An essential character for a distribution to play a central role in the limit theory is infinite divisibility. In this note, we prove that the Conway-Maxwell-Poisson (CMP) distribution is infinitely divisible iff it is the Poisson or geometric distribution. This explains that, despite its applications in a wide range of fields, there is no theoretical foundation for the CMP distribution to be a na… ▽ More An essential character for a distribution to play a central role in the limit theory is infinite divisibility. In this note, we prove that the Conway-Maxwell-Poisson (CMP) distribution is infinitely divisible iff it is the Poisson or geometric distribution. This explains that, despite its applications in a wide range of fields, there is no theoretical foundation for the CMP distribution to be a natural candidate for the law of small numbers. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: 11 pages

MSC Class: Primary 60F05; Secondary 60E05; 60E07

Journal ref: Statistics & Probability Letters, Elsevier, vol. 181, 2022

arXiv:2010.01272 [pdf, other]

Towards Interpretable Reasoning over Paragraph Effects in Situation

Authors: Mucheng Ren, Xiubo Geng, Tao Qin, Heyan Huang, Daxin Jiang

Abstract: We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect described in a background paragraph, and apply the knowledge to a novel situation. Existing works ignore the complicated reasoning process and solve it with a one-step "black box" model. Inspired by human cognitive processes, in this paper we propose a sequential approac… ▽ More We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect described in a background paragraph, and apply the knowledge to a novel situation. Existing works ignore the complicated reasoning process and solve it with a one-step "black box" model. Inspired by human cognitive processes, in this paper we propose a sequential approach for this task which explicitly models each step of the reasoning process with neural network modules. In particular, five reasoning modules are designed and learned in an end-to-end manner, which leads to a more interpretable model. Experimental results on the ROPES dataset demonstrate the effectiveness and explainability of our proposed approach. △ Less

Submitted 3 October, 2020; originally announced October 2020.

Comments: 14 pages. Accepted as EMNLP2020 Long paper

arXiv:2009.13199 [pdf, other]

doi 10.1145/3442381.3450126

Knowledge-Aware Procedural Text Understanding with Multi-Stage Training

Authors: Zhihan Zhang, Xiubo Geng, Tao Qin, Yunfang Wu, Daxin Jiang

Abstract: Procedural text describes dynamic state changes during a step-by-step natural process (e.g., photosynthesis). In this work, we focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process. Although recent approaches have achieved substantial progress, their results are far behind human performance. Two challen… ▽ More Procedural text describes dynamic state changes during a step-by-step natural process (e.g., photosynthesis). In this work, we focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process. Although recent approaches have achieved substantial progress, their results are far behind human performance. Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved, which require the incorporation of external knowledge bases. Previous works on external knowledge injection usually rely on noisy web mining tools and heuristic rules with limited applicable scenarios. In this paper, we propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge in this task. Specifically, we retrieve informative knowledge triples from ConceptNet and perform knowledge-aware reasoning while tracking the entities. Besides, we employ a multi-stage training schema which fine-tunes the BERT model over unlabeled data collected from Wikipedia before further fine-tuning it on the final model. Experimental results on two procedural text datasets, ProPara and Recipes, verify the effectiveness of the proposed methods, in which our model achieves state-of-the-art performance in comparison to various baselines. △ Less

Submitted 13 February, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: Published as full paper in Proceedings of the Web Conference 2021 (WWW'21)

arXiv:2009.13084 [pdf, ps, other]

Lipschitz-stability of Controlled Rough Paths and Rough Differential Equations

Authors: Horatio Boedihardjo, Xi Geng

Abstract: We provide an account for the existence and uniqueness of solutions to rough differential equations under the framework of controlled rough paths. The case when the driving path is $β$-Hölder continuous, for $β>1/3$, is widely available in the literature. In its extension to the case when $β\leqslant1/3,$ a main challenge and missing ingredient is to show that controlled roughs paths are closed un… ▽ More We provide an account for the existence and uniqueness of solutions to rough differential equations under the framework of controlled rough paths. The case when the driving path is $β$-Hölder continuous, for $β>1/3$, is widely available in the literature. In its extension to the case when $β\leqslant1/3,$ a main challenge and missing ingredient is to show that controlled roughs paths are closed under composition with Lipschitz transformations. Establishing such a property precisely, which has a strong algebraic nature, is a main purpose of the present article. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: 33 pages

arXiv:2009.13082 [pdf, other]

SL_2(R)-developments and Signature Asymptotics for Planar Paths with Bounded Variation

Authors: Horatio Boedihardjo, Xi Geng

Abstract: The signature transform, defined by the formal tensor series of global iterated path integrals, is a homomorphism between the path space and the tensor algebra that has been studied in geometry, control theory, number theory as well as stochastic analysis. An elegant isometry conjecture states that the length of a bounded variation path $γ$ can be recovered from the asymptotics of its normalised s… ▽ More The signature transform, defined by the formal tensor series of global iterated path integrals, is a homomorphism between the path space and the tensor algebra that has been studied in geometry, control theory, number theory as well as stochastic analysis. An elegant isometry conjecture states that the length of a bounded variation path $γ$ can be recovered from the asymptotics of its normalised signature: $\text{Length}(γ)=\lim_{n\rightarrow\infty}\big\Vert n!\int_{0<t_{1}<\cdots<t_{n}<T}dγ_{t_{1}}\otimes\cdots\otimes dγ_{t_{n}}\big\Vert^{\frac{1}{n}}$. This property depends on a key topological non-degeneracy notion known as tree-reducedness (namely, with no tree-like pieces). Existing arguments have relied crucially on $γ$ having a continuous derivative under the unit speed parametrisation. In this article, we prove the above isometry conjecture for planar paths by assuming only local bounds on the angle of $γ'$ (which ensures the absence of tree-like pieces). Our technique is based on lifting the path onto the special linear group ${\rm SL}_{2}(\mathbb{R})$ and analysing the behaviour of the associated angle dynamics at a microscopic level. △ Less

Submitted 7 November, 2022; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: 45 pages, 3 figures

arXiv:2009.08607 [pdf, ps, other]

Compact Learning for Multi-Label Classification

Authors: Jiaqi Lv, Tianran Wu, Chenglun Peng, Yunpeng Liu, Ning Xu, Xin Geng

Abstract: Multi-label classification (MLC) studies the problem where each instance is associated with multiple relevant labels, which leads to the exponential growth of output space. MLC encourages a popular framework named label compression (LC) for capturing label dependency with dimension reduction. Nevertheless, most existing LC methods failed to consider the influence of the feature space or misguided… ▽ More Multi-label classification (MLC) studies the problem where each instance is associated with multiple relevant labels, which leads to the exponential growth of output space. MLC encourages a popular framework named label compression (LC) for capturing label dependency with dimension reduction. Nevertheless, most existing LC methods failed to consider the influence of the feature space or misguided by original problematic features, so that may result in performance degeneration. In this paper, we present a compact learning (CL) framework to embed the features and labels simultaneously and with mutual guidance. The proposal is a versatile concept, hence the embedding way is arbitrary and independent of the subsequent learning process. Following its spirit, a simple yet effective implementation called compact multi-label learning (CMLL) is proposed to learn a compact low-dimensional representation for both spaces. CMLL maximizes the dependence between the embedded spaces of the labels and features, and minimizes the loss of label space recovery concurrently. Theoretically, we provide a general analysis for different embedding methods. Practically, we conduct extensive experiments to validate the effectiveness of the proposed method. △ Less

Submitted 17 September, 2020; originally announced September 2020.

arXiv:2008.01229 [pdf, ps, other]

Precise Local Estimates for Differential Equations driven by Fractional Brownian Motion: Hypoelliptic Case

Authors: Xi Geng, Cheng Ouyang, Samy Tindel

Abstract: This article is concerned with stochastic differential equations driven by a $d$ dimensional fractional Brownian motion with Hurst parameter $H>1/4$, understood in the rough paths sense. Whenever the coefficients of the equation satisfy a uniform hypoellipticity condition, we establish a sharp local estimate on the associated control distance function and a sharp local lower estimate on the densit… ▽ More This article is concerned with stochastic differential equations driven by a $d$ dimensional fractional Brownian motion with Hurst parameter $H>1/4$, understood in the rough paths sense. Whenever the coefficients of the equation satisfy a uniform hypoellipticity condition, we establish a sharp local estimate on the associated control distance function and a sharp local lower estimate on the density of the solution. Our methodology relies heavily on the rough paths structure of the equation. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: This preprint is the result of splitting our original submission arXiv:1907.00171, which was slightly too long. The current preprint contains the hypoelliptic part of our analysis. Part of the presentation (and arguments) in the current preprint is different from the original submission

MSC Class: 60H10; 60H07; 60G15

arXiv:2007.16178 [pdf, ps, other]

Precise Local Estimates for Differential Equations driven by Fractional Brownian Motion: Elliptic Case

Authors: Xi Geng, Cheng Ouyang, Samy Tindel

Abstract: This article is concerned with stochastic differential equations driven by a $d$ dimensional fractional Brownian motion with Hurst parameter $H>1/4$, understood in the rough paths sense. Whenever the coefficients of the equation satisfy a uniform ellipticity condition, we establish a sharp local estimate on the associated control distance function and a sharp local lower estimate on the density of… ▽ More This article is concerned with stochastic differential equations driven by a $d$ dimensional fractional Brownian motion with Hurst parameter $H>1/4$, understood in the rough paths sense. Whenever the coefficients of the equation satisfy a uniform ellipticity condition, we establish a sharp local estimate on the associated control distance function and a sharp local lower estimate on the density of the solution. △ Less

Submitted 31 July, 2020; originally announced July 2020.

Comments: This preprint is the result of splitting our original submission arXiv:1907.00171, which was slightly too long. The current preprint contains the elliptic part of our analysis

MSC Class: 60H10; 60H07; 60G15

arXiv:2007.15555 [pdf, other]

doi 10.1007/s11433-020-1666-8

First experimental constraints on WIMP couplings in the effective field theory framework from CDEX

Authors: Y. Wang, Z. Zeng, Q. Yue, L. T. Yang, K. J. Kang, Y. J. Li, M. Agartioglu, H. P. An, J. P. Chang, J. H. Chen, Y. H. Chen, J. P. Cheng, C. Y. Chiang, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, X. Y. Guo, H. J. He, L. He, S. M. He, J. W. Hu, T. C. Huang , et al. (63 additional authors not shown)

Abstract: We present weakly interacting massive particles (WIMPs) search results performed using two approaches of effective field theory from the China Dark Matter Experiment (CDEX), based on the data from both CDEX-1B and CDEX-10 stages. In the nonrelativistic effective field theory approach, both time-integrated and annual modulation analyses were used to set new limits for the coupling of WIMP-nucleon e… ▽ More We present weakly interacting massive particles (WIMPs) search results performed using two approaches of effective field theory from the China Dark Matter Experiment (CDEX), based on the data from both CDEX-1B and CDEX-10 stages. In the nonrelativistic effective field theory approach, both time-integrated and annual modulation analyses were used to set new limits for the coupling of WIMP-nucleon effective operators at 90% confidence level (C.L.) and improve over the current bounds in the low $m_χ$ region. In the chiral effective field theory approach, data from CDEX-10 were used to set an upper limit on WIMP-pion coupling at 90% C.L. We for the first time extended the limit to the $m_χ<$ 6 GeV/$c^2$ region. △ Less

Submitted 26 April, 2021; v1 submitted 30 July, 2020; originally announced July 2020.

Comments: version accepted by Science China-PMA, 8 pages, 8 figures

Journal ref: Sci. China-Phys. Mech. Astron. 64, 281011 (2021)

arXiv:2007.08929 [pdf, other]

Provably Consistent Partial-Label Learning

Authors: Lei Feng, Jiaqi Lv, Bo Han, Miao Xu, Gang Niu, Xin Geng, Bo An, Masashi Sugiyama

Abstract: Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels. Even though many practical PLL methods have been proposed in the last two decades, there lacks a theoretical understanding of the consistency of those methods-none of the PLL methods hitherto possesses a generation process of candidate label sets, and then… ▽ More Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels. Even though many practical PLL methods have been proposed in the last two decades, there lacks a theoretical understanding of the consistency of those methods-none of the PLL methods hitherto possesses a generation process of candidate label sets, and then it is still unclear why such a method works on a specific dataset and when it may fail given a different dataset. In this paper, we propose the first generation model of candidate label sets, and develop two novel PLL methods that are guaranteed to be provably consistent, i.e., one is risk-consistent and the other is classifier-consistent. Our methods are advantageous, since they are compatible with any deep network or stochastic optimizer. Furthermore, thanks to the generation model, we would be able to answer the two questions above by testing if the generation model matches given candidate label sets. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed generation model and two PLL methods. △ Less

Submitted 23 October, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

Comments: NeurIPS 2020 camera-ready version

arXiv:2007.01771 [pdf, other]

Learning Expectation of Label Distribution for Facial Age and Attractiveness Estimation

Authors: Bin-Bin Gao, Xin-Xin Liu, Hong-Yu Zhou, Jianxin Wu, Xin Geng

Abstract: Facial attributes (\eg, age and attractiveness) estimation performance has been greatly improved by using convolutional neural networks. However, existing methods have an inconsistency between the training objectives and the evaluation metric, so they may be suboptimal. In addition, these methods always adopt image classification or face recognition models with a large amount of parameters, which… ▽ More Facial attributes (\eg, age and attractiveness) estimation performance has been greatly improved by using convolutional neural networks. However, existing methods have an inconsistency between the training objectives and the evaluation metric, so they may be suboptimal. In addition, these methods always adopt image classification or face recognition models with a large amount of parameters, which carry expensive computation cost and storage overhead. In this paper, we firstly analyze the essential relationship between two state-of-the-art methods (Ranking-CNN and DLDL) and show that the Ranking method is in fact learning label distribution implicitly. This result thus firstly unifies two existing popular state-of-the-art methods into the DLDL framework. Second, in order to alleviate the inconsistency and reduce resource consumption, we design a lightweight network architecture and propose a unified framework which can jointly learn facial attribute distribution and regress attribute value. The effectiveness of our approach has been demonstrated on both facial age and attractiveness estimation tasks. Our method achieves new state-of-the-art results using the single model with 36$\times$ fewer parameters and 3$\times$ faster inference speed on facial age/attractiveness estimation. Moreover, our method can achieve comparable results as the state-of-the-art even though the number of parameters is further reduced to 0.9M (3.8MB disk storage). △ Less

Submitted 31 December, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

Comments: submitted to Pattern Recognition

arXiv:2006.07178 [pdf, other]

Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling

Authors: Russell Mendonca, Xinyang Geng, Chelsea Finn, Sergey Levine

Abstract: Reinforcement learning algorithms can acquire policies for complex tasks autonomously. However, the number of samples required to learn a diverse set of skills can be prohibitively large. While meta-reinforcement learning methods have enabled agents to leverage prior experience to adapt quickly to new tasks, their performance depends crucially on how close the new task is to the previously experie… ▽ More Reinforcement learning algorithms can acquire policies for complex tasks autonomously. However, the number of samples required to learn a diverse set of skills can be prohibitively large. While meta-reinforcement learning methods have enabled agents to leverage prior experience to adapt quickly to new tasks, their performance depends crucially on how close the new task is to the previously experienced tasks. Current approaches are either not able to extrapolate well, or can do so at the expense of requiring extremely large amounts of data for on-policy meta-training. In this work, we present model identification and experience relabeling (MIER), a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data, more easily than policies and value functions. These dynamics models can then be used to continue training policies and value functions for out-of-distribution tasks without using meta-reinforcement learning at all, by generating synthetic experience for the new task. △ Less

Submitted 15 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

arXiv:2005.00979 [pdf, other]

How Does Selective Mechanism Improve Self-Attention Networks?

Authors: Xinwei Geng, Longyue Wang, Xing Wang, Bing Qin, Ting Liu, Zhaopeng Tu

Abstract: Self-attention networks (SANs) with selective mechanism has produced substantial improvements in various NLP tasks by concentrating on a subset of input words. However, the underlying reasons for their strong performance have not been well explained. In this paper, we bridge the gap by assessing the strengths of selective SANs (SSANs), which are implemented with a flexible and universal Gumbel-Sof… ▽ More Self-attention networks (SANs) with selective mechanism has produced substantial improvements in various NLP tasks by concentrating on a subset of input words. However, the underlying reasons for their strong performance have not been well explained. In this paper, we bridge the gap by assessing the strengths of selective SANs (SSANs), which are implemented with a flexible and universal Gumbel-Softmax. Experimental results on several representative NLP tasks, including natural language inference, semantic role labelling, and machine translation, show that SSANs consistently outperform the standard SANs. Through well-designed probing experiments, we empirically validate that the improvement of SSANs can be attributed in part to mitigating two commonly-cited weaknesses of SANs: word order encoding and structure modeling. Specifically, the selective mechanism improves SANs by paying more attention to content words that contribute to the meaning of the sentence. The code and data are released at https://github.com/xwgeng/SSAN. △ Less

Submitted 3 May, 2020; originally announced May 2020.

Comments: ACL 2020

arXiv:2004.14164 [pdf, other]

doi 10.1145/3340531.3411858

MICK: A Meta-Learning Framework for Few-shot Relation Classification with Small Training Data

Authors: Xiaoqing Geng, Xiwen Chen, Kenny Q. Zhu, Libin Shen, Yinggong Zhao

Abstract: Few-shot relation classification seeks to classify incoming query instances after meeting only few support instances. This ability is gained by training with large amount of in-domain annotated data. In this paper, we tackle an even harder problem by further limiting the amount of data available at training time. We propose a few-shot learning framework for relation classification, which is partic… ▽ More Few-shot relation classification seeks to classify incoming query instances after meeting only few support instances. This ability is gained by training with large amount of in-domain annotated data. In this paper, we tackle an even harder problem by further limiting the amount of data available at training time. We propose a few-shot learning framework for relation classification, which is particularly powerful when the training data is very small. In this framework, models not only strive to classify query instances, but also seek underlying knowledge about the support instances to obtain better instance representations. The framework also includes a method for aggregating cross-domain knowledge into models by open-source task enrichment. Additionally, we construct a brand new dataset: the TinyRel-CM dataset, a few-shot relation classification dataset in health domain with purposely small training data and challenging relation classes. Experimental results demonstrate that our framework brings performance gains for most underlying classification models, outperforms the state-of-the-art results given small training data, and achieves competitive results with sufficiently large training data. △ Less

Submitted 14 December, 2020; v1 submitted 26 April, 2020; originally announced April 2020.

Journal ref: CIKM 2020: The 29th ACM International Conference on Information and Knowledge Management

arXiv:2004.08861 [pdf, other]

Role-Wise Data Augmentation for Knowledge Distillation

Authors: Jie Fu, Xue Geng, Zhijian Duan, Bohan Zhuang, Xingdi Yuan, Adam Trischler, Jie Lin, Chris Pal, Hao Dong

Abstract: Knowledge Distillation (KD) is a common method for transferring the ``knowledge'' learned by one machine learning model (the \textit{teacher}) into another model (the \textit{student}), where typically, the teacher has a greater capacity (e.g., more parameters or higher bit-widths). To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teac… ▽ More Knowledge Distillation (KD) is a common method for transferring the ``knowledge'' learned by one machine learning model (the \textit{teacher}) into another model (the \textit{student}), where typically, the teacher has a greater capacity (e.g., more parameters or higher bit-widths). To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teacher, both models share the same input data -- and this data is the only medium by which the teacher's knowledge can be demonstrated. Due to the difference in model capacities, the student may not benefit fully from the same data points on which the teacher is trained. On the other hand, a human teacher may demonstrate a piece of knowledge with individualized examples adapted to a particular student, for instance, in terms of her cultural background and interests. Inspired by this behavior, we design data augmentation agents with distinct roles to facilitate knowledge distillation. Our data augmentation agents generate distinct training data for the teacher and student, respectively. We find empirically that specially tailored data points enable the teacher's knowledge to be demonstrated more effectively to the student. We compare our approach with existing KD methods on training popular neural architectures and demonstrate that role-wise data augmentation improves the effectiveness of KD over strong prior approaches. The code for reproducing our results can be found at https://github.com/bigaidream-projects/role-kd △ Less

Submitted 19 April, 2020; originally announced April 2020.

arXiv:2004.02717 [pdf, other]

Joint Routing and Scheduling for Large-Scale Deterministic IP Networks

Authors: Jonatan Krolikowski, Sebastien Martin, Paolo Medagliani, Jeremie Leguay, Shuang Chen, Xiaodong Chang, Xuesong Geng

Abstract: With the advent of 5G and the evolution of Internet protocols, industrial applications are moving from vertical solutions to general purpose IP-based infrastructures that need to meet deterministic Quality of Service (QoS) requirements. The IETF DetNet working group aims at providing an answer to this need with support for (i) deterministic worst-case latency and jitter, and (ii) zero packet loss… ▽ More With the advent of 5G and the evolution of Internet protocols, industrial applications are moving from vertical solutions to general purpose IP-based infrastructures that need to meet deterministic Quality of Service (QoS) requirements. The IETF DetNet working group aims at providing an answer to this need with support for (i) deterministic worst-case latency and jitter, and (ii) zero packet loss for time-sensitive traffic. In this paper we focus on the joint routing and scheduling problem in large scale deterministic networks using Cycle Specified Queuing and Forwarding (CSQF), an extension of Cyclic Queuing and Forwarding (CQF) with multiple transmission queues and support of segment routing. In this context, we present two centralized algorithms to maximize traffic acceptance for network planning and online flow admission. We propose an effective solution based on column generation and dynamic programming. Thanks to the reinforcement of the model with valid inequalities, we improve the upper bound and the solution. We demonstrate on realistic instances that we reach an optimality gap smaller than 10% in a few seconds. Finally, we also derive an ultra-fast adaptive greedy algorithm to solve the problem at the cost of a small extra gap. △ Less

Submitted 28 October, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: To appear in Elsevier Computer Communications

arXiv:2004.02616 [pdf]

A spin-filter for polarized electron acceleration in plasma wakefields

Authors: Yitong Wu, Liangliang Ji, Xuesong Geng, Johannes Thomas, Markus Büscher, Alexander Pukhov, Anna Hützen, Lingang Zhang, Baifei Shen, Ruxin Li

Abstract: We propose a filter method to generate electron beams of high polarization from bubble and blow-out wakefield accelerators. The mechanism is based on the idea to identify all electron-beam subsets with low-polarization and to filter them out by an X-shaped slit placed right behind the plasma accelerator. To find these subsets we investigate the dependence between the initial azimuthal angle and th… ▽ More We propose a filter method to generate electron beams of high polarization from bubble and blow-out wakefield accelerators. The mechanism is based on the idea to identify all electron-beam subsets with low-polarization and to filter them out by an X-shaped slit placed right behind the plasma accelerator. To find these subsets we investigate the dependence between the initial azimuthal angle and the spin of single electrons during the trapping process. This dependence shows that transverse electron spins preserve their orientation during injection if they are initially aligned parallel or anti-parallel to the local magnetic field. We derive a precise correlation of the local beam polarization as a function of the coordinate and the electron phase angle. Three-dimensional particle-in-cell simulations, incorporating classical spin dynamics, show that the beam polarization can be increased from 35% to about 80% after spin filtering. The injected flux is strongly restricted to preserve the beam polarization, e.g. <1kA in Ref.[27]. This limitation is removed by employing the proposed filter mechanism. The robust of the method is discussed that contains drive beam fluctuations, jitters, the thickness of the filter and initial temperature. This idea marks an efficient and simple strategy to generate energetic polarized electron beams based on wakefield acceleration △ Less

Submitted 6 April, 2020; originally announced April 2020.

arXiv:2002.12591 [pdf, other]

DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding

Authors: Yuyu Zhang, Ping Nie, Xiubo Geng, Arun Ramamurthy, Le Song, Daxin Jiang

Abstract: Recent studies on open-domain question answering have achieved prominent performance improvement using pre-trained language models such as BERT. State-of-the-art approaches typically follow the "retrieve and read" pipeline and employ BERT-based reranker to filter retrieved documents before feeding them into the reader module. The BERT retriever takes as input the concatenation of question and each… ▽ More Recent studies on open-domain question answering have achieved prominent performance improvement using pre-trained language models such as BERT. State-of-the-art approaches typically follow the "retrieve and read" pipeline and employ BERT-based reranker to filter retrieved documents before feeding them into the reader module. The BERT retriever takes as input the concatenation of question and each retrieved document. Despite the success of these approaches in terms of QA accuracy, due to the concatenation, they can barely handle high-throughput of incoming questions each with a large collection of retrieved documents. To address the efficiency problem, we propose DC-BERT, a decoupled contextual encoding framework that has dual BERT models: an online BERT which encodes the question only once, and an offline BERT which pre-encodes all the documents and caches their encodings. On SQuAD Open and Natural Questions Open datasets, DC-BERT achieves 10x speedup on document retrieval, while retaining most (about 98%) of the QA performance compared to state-of-the-art approaches for open-domain question answering. △ Less

Submitted 28 February, 2020; originally announced February 2020.

arXiv:2002.11089 [pdf, other]

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

Authors: Benjamin Eysenbach, Xinyang Geng, Sergey Levine, Ruslan Salakhutdinov

Abstract: Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically ask: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper, we show that hindsi… ▽ More Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically ask: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper, we show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary classes of tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings, including goal-reaching, domains with discrete sets of rewards, and those with linear reward functions. △ Less

Submitted 25 February, 2020; originally announced February 2020.

arXiv:2002.08053 [pdf, other]

Progressive Identification of True Labels for Partial-Label Learning

Authors: Jiaqi Lv, Miao Xu, Lei Feng, Gang Niu, Xin Geng, Masashi Sugiyama

Abstract: Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label. Most existing methods elaborately designed learning objectives as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data… ▽ More Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label. Most existing methods elaborately designed learning objectives as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data. The goal of this paper is to propose a novel framework of PLL with flexibility on the model and optimization algorithm. More specifically, we propose a novel estimator of the classification risk, theoretically analyze the classifier-consistency, and establish an estimation error bound. Then we propose a progressive identification algorithm for approximately minimizing the proposed risk estimator, where the update of the model and identification of true labels are conducted in a seamless manner. The resulting algorithm is model-independent and loss-independent, and compatible with stochastic optimization. Thorough experiments demonstrate it sets the new state of the art. △ Less

Submitted 5 September, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: In Proceedings of the 37th International Conference on Machine Learning (ICML 2020)

Showing 151–200 of 282 results for author: Geng, X