-
VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization
Authors:
Tengchao Lv,
Lei Cui,
Momcilo Vasilijevic,
Furu Wei
Abstract:
Video transcript summarization is a fundamental task for video understanding. Conventional approaches for transcript summarization are usually built upon the summarization data for written language such as news articles, while the domain discrepancy may degrade the model performance on spoken text. In this paper, we present VT-SSum, a benchmark dataset with spoken language for video transcript seg…
▽ More
Video transcript summarization is a fundamental task for video understanding. Conventional approaches for transcript summarization are usually built upon the summarization data for written language such as news articles, while the domain discrepancy may degrade the model performance on spoken text. In this paper, we present VT-SSum, a benchmark dataset with spoken language for video transcript segmentation and summarization, which includes 125K transcript-summary pairs from 9,616 videos. VT-SSum takes advantage of the videos from VideoLectures.NET by leveraging the slides content as the weak supervision to generate the extractive summary for video transcripts. Experiments with a state-of-the-art deep learning approach show that the model trained with VT-SSum brings a significant improvement on the AMI spoken text summarization benchmark. VT-SSum is publicly available at https://github.com/Dod-o/VT-SSum to support the future research of video transcript segmentation and summarization tasks.
△ Less
Submitted 15 July, 2021; v1 submitted 10 June, 2021;
originally announced June 2021.
-
Template-Based Named Entity Recognition Using BART
Authors:
Leyang Cui,
Yu Wu,
Jian Liu,
Sen Yang,
Yue Zhang
Abstract:
There is a recent interest in investigating few-shot NER, where the low-resource target domain has different label sets compared with a resource-rich source domain. Existing methods use a similarity-based metric. However, they cannot make full use of knowledge transfer in NER model parameters. To address the issue, we propose a template-based method for NER, treating NER as a language model rankin…
▽ More
There is a recent interest in investigating few-shot NER, where the low-resource target domain has different label sets compared with a resource-rich source domain. Existing methods use a similarity-based metric. However, they cannot make full use of knowledge transfer in NER model parameters. To address the issue, we propose a template-based method for NER, treating NER as a language model ranking problem in a sequence-to-sequence framework, where original sentences and statement templates filled by candidate named entity span are regarded as the source sequence and the target sequence, respectively. For inference, the model is required to classify each candidate span based on the corresponding template scores. Our experiments demonstrate that the proposed method achieves 92.55% F1 score on the CoNLL03 (rich-resource task), and significantly better than fine-tuning BERT 10.88%, 15.34%, and 11.73% F1 score on the MIT Movie, the MIT Restaurant, and the ATIS (low-resource task), respectively.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
Uni-Encoder: A Fast and Accurate Response Selection Paradigm for Generation-Based Dialogue Systems
Authors:
Chiyu Song,
Hongliang He,
Haofei Yu,
Pengfei Fang,
Leyang Cui,
Zhenzhong Lan
Abstract:
Sample-and-rank is a key decoding strategy for modern generation-based dialogue systems. It helps achieve diverse and high-quality responses by selecting an answer from a small pool of generated candidates. The current state-of-the-art ranking methods mainly use an encoding paradigm called Cross-Encoder, which separately encodes each context-candidate pair and ranks the candidates according to the…
▽ More
Sample-and-rank is a key decoding strategy for modern generation-based dialogue systems. It helps achieve diverse and high-quality responses by selecting an answer from a small pool of generated candidates. The current state-of-the-art ranking methods mainly use an encoding paradigm called Cross-Encoder, which separately encodes each context-candidate pair and ranks the candidates according to their fitness scores. However, Cross-Encoder repeatedly encodes the same lengthy context for each candidate, resulting in high computational costs. Poly-Encoder addresses the above problems by reducing the interaction between context and candidates, but with a price of performance drop. In this work, we develop a new paradigm called Uni-Encoder, that keeps the full attention over each pair as in Cross-Encoder while only encoding the context once, as in Poly-Encoder. Uni-Encoder encodes all the candidates with the context in one forward pass. We use the same positional embedding for all candidates to ensure they are treated equally and design a new attention mechanism to avoid confusion. Our Uni-Encoder can simulate other ranking paradigms using different attention and response concatenation methods. Extensive experiments show that our proposed paradigm achieves new state-of-the-art results on four benchmark datasets with high computational efficiency. For instance, it improves R10@1 by 2.9% with an approximately 4X faster inference speed on the Ubuntu V2 dataset.
△ Less
Submitted 15 May, 2023; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Few-Shot Partial-Label Learning
Authors:
Yunfeng Zhao,
Guoxian Yu,
Lei Liu,
Zhongmin Yan,
Lizhen Cui,
Carlotta Domeniconi
Abstract:
Partial-label learning (PLL) generally focuses on inducing a noise-tolerant multi-class classifier by training on overly-annotated samples, each of which is annotated with a set of labels, but only one is the valid label. A basic promise of existing PLL solutions is that there are sufficient partial-label (PL) samples for training. However, it is more common than not to have just few PL samples at…
▽ More
Partial-label learning (PLL) generally focuses on inducing a noise-tolerant multi-class classifier by training on overly-annotated samples, each of which is annotated with a set of labels, but only one is the valid label. A basic promise of existing PLL solutions is that there are sufficient partial-label (PL) samples for training. However, it is more common than not to have just few PL samples at hand when dealing with new tasks. Furthermore, existing few-shot learning algorithms assume precise labels of the support set; as such, irrelevant labels may seriously mislead the meta-learner and thus lead to a compromised performance. How to enable PLL under a few-shot learning setting is an important problem, but not yet well studied. In this paper, we introduce an approach called FsPLL (Few-shot PLL). FsPLL first performs adaptive distance metric learning by an embedding network and rectifying prototypes on the tasks previously encountered. Next, it calculates the prototype of each class of a new task in the embedding network. An unseen example can then be classified via its distance to each prototype. Experimental results on widely-used few-shot datasets (Omniglot and miniImageNet) demonstrate that our FsPLL can achieve a superior performance than the state-of-the-art methods across different settings, and it needs fewer samples for quickly adapting to new tasks.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
NoiLIn: Improving Adversarial Training and Correcting Stereotype of Noisy Labels
Authors:
Jingfeng Zhang,
Xilie Xu,
Bo Han,
Tongliang Liu,
Gang Niu,
Lizhen Cui,
Masashi Sugiyama
Abstract:
Adversarial training (AT) formulated as the minimax optimization problem can effectively enhance the model's robustness against adversarial attacks. The existing AT methods mainly focused on manipulating the inner maximization for generating quality adversarial variants or manipulating the outer minimization for designing effective learning objectives. However, empirical results of AT always exhib…
▽ More
Adversarial training (AT) formulated as the minimax optimization problem can effectively enhance the model's robustness against adversarial attacks. The existing AT methods mainly focused on manipulating the inner maximization for generating quality adversarial variants or manipulating the outer minimization for designing effective learning objectives. However, empirical results of AT always exhibit the robustness at odds with accuracy and the existence of the cross-over mixture problem, which motivates us to study some label randomness for benefiting the AT. First, we thoroughly investigate noisy labels (NLs) injection into AT's inner maximization and outer minimization, respectively and obtain the observations on when NL injection benefits AT. Second, based on the observations, we propose a simple but effective method -- NoiLIn that randomly injects NLs into training data at each training epoch and dynamically increases the NL injection rate once robust overfitting occurs. Empirically, NoiLIn can significantly mitigate the AT's undesirable issue of robust overfitting and even further improve the generalization of the state-of-the-art AT methods. Philosophically, NoiLIn sheds light on a new perspective of learning with NLs: NLs should not always be deemed detrimental, and even in the absence of NLs in the training set, we may consider injecting them deliberately. Codes are available in https://github.com/zjfheart/NoiLIn.
△ Less
Submitted 4 August, 2022; v1 submitted 30 May, 2021;
originally announced May 2021.
-
AutoSampling: Search for Effective Data Sampling Schedules
Authors:
Ming Sun,
Haoxuan Dou,
Baopu Li,
Lei Cui,
Junjie Yan,
Wanli Ouyang
Abstract:
Data sampling acts as a pivotal role in training deep learning models. However, an effective sampling schedule is difficult to learn due to the inherently high dimension of parameters in learning the sampling schedule. In this paper, we propose an AutoSampling method to automatically learn sampling schedules for model training, which consists of the multi-exploitation step aiming for optimal local…
▽ More
Data sampling acts as a pivotal role in training deep learning models. However, an effective sampling schedule is difficult to learn due to the inherently high dimension of parameters in learning the sampling schedule. In this paper, we propose an AutoSampling method to automatically learn sampling schedules for model training, which consists of the multi-exploitation step aiming for optimal local sampling schedules and the exploration step for the ideal sampling distribution. More specifically, we achieve sampling schedule search with shortened exploitation cycle to provide enough supervision. In addition, we periodically estimate the sampling distribution from the learned sampling schedules and perturb it to search in the distribution space. The combination of two searches allows us to learn a robust sampling schedule. We apply our AutoSampling method to a variety of image classification tasks illustrating the effectiveness of the proposed method.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
Where are we in embedding spaces? A Comprehensive Analysis on Network Embedding Approaches for Recommender Systems
Authors:
Sixiao Zhang,
Hongxu Chen,
Xiao Ming,
Lizhen Cui,
Hongzhi Yin,
Guandong Xu
Abstract:
Hyperbolic space and hyperbolic embeddings are becoming a popular research field for recommender systems. However, it is not clear under what circumstances the hyperbolic space should be considered. To fill this gap, This paper provides theoretical analysis and empirical results on when and where to use hyperbolic space and hyperbolic embeddings in recommender systems. Specifically, we answer the…
▽ More
Hyperbolic space and hyperbolic embeddings are becoming a popular research field for recommender systems. However, it is not clear under what circumstances the hyperbolic space should be considered. To fill this gap, This paper provides theoretical analysis and empirical results on when and where to use hyperbolic space and hyperbolic embeddings in recommender systems. Specifically, we answer the questions that which type of models and datasets are more suited for hyperbolic space, as well as which latent size to choose. We evaluate our answers by comparing the performance of Euclidean space and hyperbolic space on different latent space models in both general item recommendation domain and social recommendation domain, with 6 widely used datasets and different latent sizes. Additionally, we propose a new metric learning based recommendation method called SCML and its hyperbolic version HSCML. We evaluate our conclusions regarding hyperbolic space on SCML and show the state-of-the-art performance of hyperbolic space by comparing HSCML with other baseline methods.
△ Less
Submitted 18 May, 2021;
originally announced May 2021.
-
Slashing Communication Traffic in Federated Learning by Transmitting Clustered Model Updates
Authors:
Laizhong Cui,
Xiaoxin Su,
Yipeng Zhou,
Yi Pan
Abstract:
Federated Learning (FL) is an emerging decentralized learning framework through which multiple clients can collaboratively train a learning model. However, a major obstacle that impedes the wide deployment of FL lies in massive communication traffic. To train high dimensional machine learning models (such as CNN models), heavy communication traffic can be incurred by exchanging model updates via t…
▽ More
Federated Learning (FL) is an emerging decentralized learning framework through which multiple clients can collaboratively train a learning model. However, a major obstacle that impedes the wide deployment of FL lies in massive communication traffic. To train high dimensional machine learning models (such as CNN models), heavy communication traffic can be incurred by exchanging model updates via the Internet between clients and the parameter server (PS), implying that the network resource can be easily exhausted. Compressing model updates is an effective way to reduce the traffic amount. However, a flexible unbiased compression algorithm applicable for both uplink and downlink compression in FL is still absent from existing works. In this work, we devise the Model Update Compression by Soft Clustering (MUCSC) algorithm to compress model updates transmitted between clients and the PS. In MUCSC, it is only necessary to transmit cluster centroids and the cluster ID of each model update. Moreover, we prove that: 1) The compressed model updates are unbiased estimation of their original values so that the convergence rate by transmitting compressed model updates is unchanged; 2) MUCSC can guarantee that the influence of the compression error on the model accuracy is minimized. Then, we further propose the boosted MUCSC (B-MUCSC) algorithm, a biased compression algorithm that can achieve an extremely high compression rate by grouping insignificant model updates into a super cluster. B-MUCSC is suitable for scenarios with very scarce network resource. Ultimately, we conduct extensive experiments with the CIFAR-10 and FEMNIST datasets to demonstrate that our algorithms can not only substantially reduce the volume of communication traffic in FL, but also improve the training efficiency in practical networks.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Authors:
Yiheng Xu,
Tengchao Lv,
Lei Cui,
Guoxin Wang,
Yijuan Lu,
Dinei Florencio,
Cha Zhang,
Furu Wei
Abstract:
Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually-rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities. In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich doc…
▽ More
Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually-rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities. In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding. To accurately evaluate LayoutXLM, we also introduce a multilingual form understanding benchmark dataset named XFUND, which includes form understanding samples in 7 languages (Chinese, Japanese, Spanish, French, Italian, German, Portuguese), and key-value pairs are manually labeled for each language. Experiment results show that the LayoutXLM model has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUND dataset. The pre-trained LayoutXLM model and the XFUND dataset are publicly available at https://aka.ms/layoutxlm.
△ Less
Submitted 9 September, 2021; v1 submitted 18 April, 2021;
originally announced April 2021.
-
Zero-Shot Instance Segmentation
Authors:
Ye Zheng,
Jiahong Wu,
Yongqiang Qin,
Faen Zhang,
Li Cui
Abstract:
Deep learning has significantly improved the precision of instance segmentation with abundant labeled data. However, in many areas like medical and manufacturing, collecting sufficient data is extremely hard and labeling this data requires high professional skills. We follow this motivation and propose a new task set named zero-shot instance segmentation (ZSI). In the training phase of ZSI, the mo…
▽ More
Deep learning has significantly improved the precision of instance segmentation with abundant labeled data. However, in many areas like medical and manufacturing, collecting sufficient data is extremely hard and labeling this data requires high professional skills. We follow this motivation and propose a new task set named zero-shot instance segmentation (ZSI). In the training phase of ZSI, the model is trained with seen data, while in the testing phase, it is used to segment all seen and unseen instances. We first formulate the ZSI task and propose a method to tackle the challenge, which consists of Zero-shot Detector, Semantic Mask Head, Background Aware RPN and Synchronized Background Strategy. We present a new benchmark for zero-shot instance segmentation based on the MS-COCO dataset. The extensive empirical results in this benchmark show that our method not only surpasses the state-of-the-art results in zero-shot object detection task but also achieves promising performance on ZSI. Our approach will serve as a solid baseline and facilitate future research in zero-shot instance segmentation.
△ Less
Submitted 31 May, 2021; v1 submitted 13 April, 2021;
originally announced April 2021.
-
East Asian VLBI Network Observations of Active Galactic Nuclei Jets: Imaging with KaVA+Tianma+Nanshan
Authors:
Yuzhu Cui,
Kazuhiro Hada,
Motoki Kino,
Bong Won Sohn,
Jongho Park,
Hyun Wook Ro,
Satoko Sawada-Satoh,
Wu Jiang,
Lang Cui,
Mareki Honma,
Zhi Qiang Shen,
Fumie Tazaki,
Tao An,
Ilje Cho,
Guang Yao Zhao,
Xiao Peng Cheng,
Kotaro Niinuma,
Kiyoaki Wajima,
Ying Kang Zhang,
Noriyuki Kawaguchi,
Juan Carlos Algaba,
Shoko Koyama,
Tomoya Hirota,
Yoshinori Yonekura,
Nobuyuki Sakai
, et al. (52 additional authors not shown)
Abstract:
The East Asian very-long-baseline interferometry (VLBI) Network (EAVN) is a rapidly evolving international VLBI array that is currently promoted under joint efforts among China, Japan, and Korea. EAVN aims at forming a joint VLBI Network by combining a large number of radio telescopes distributed over East Asian regions. After the combination of the Korean VLBI Network (KVN) and the VLBI Explorati…
▽ More
The East Asian very-long-baseline interferometry (VLBI) Network (EAVN) is a rapidly evolving international VLBI array that is currently promoted under joint efforts among China, Japan, and Korea. EAVN aims at forming a joint VLBI Network by combining a large number of radio telescopes distributed over East Asian regions. After the combination of the Korean VLBI Network (KVN) and the VLBI Exploration of Radio Astrometry (VERA) into KaVA, further expansion with the joint array in East Asia is actively promoted. Here we report the first imaging results (at 22 and 43 GHz) of bright radio sources obtained with KaVA connected to Tianma 65-m and Nanshan 26-m Radio Telescopes in China. To test the EAVN imaging performance for different sources, we observed four active galactic nuclei (AGN) having different brightness and morphology. As a result, we confirmed that Tianma 65-m Radio Telescope (TMRT) significantly enhances the overall array sensitivity, a factor of 4 improvement in baseline sensitivity and 2 in image dynamic range compared to the case of KaVA only. The addition of Nanshan 26-m Radio Telescope (NSRT) further doubled the east-west angular resolution. With the resulting high-dynamic-range, high-resolution images with EAVN (KaVA+TMRT+NSRT), various fine-scale structures in our targets, such as the counter-jet in M87, a kink-like morphology of the 3C273 jet and the weak emission in other sources, are successfully detected. This demonstrates the powerful capability of EAVN to study AGN jets and to achieve other science goals in general. Ongoing expansion of EAVN will further enhance the angular resolution, detection sensitivity and frequency coverage of the network.
△ Less
Submitted 14 April, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
On cleanness of von Neumann algebras
Authors:
Lu Cui,
Linzhe Huang,
Wenming Wu,
Wei Yuan,
Hanbin Zhang
Abstract:
A unital ring is called clean (resp. strongly clean) if every element can be written as the sum of an invertible element and an idempotent (resp. an invertible element and an idempotent that commutes). T.Y. Lam proposed a question: which von Neumann algebras are clean as rings? In this paper, we characterize strongly clean von Neumann algebras and prove that all finite von Neumann algebras and all…
▽ More
A unital ring is called clean (resp. strongly clean) if every element can be written as the sum of an invertible element and an idempotent (resp. an invertible element and an idempotent that commutes). T.Y. Lam proposed a question: which von Neumann algebras are clean as rings? In this paper, we characterize strongly clean von Neumann algebras and prove that all finite von Neumann algebras and all separable infinite factors are clean.
△ Less
Submitted 12 January, 2022; v1 submitted 9 April, 2021;
originally announced April 2021.
-
Noise-resistant Deep Metric Learning with Ranking-based Instance Selection
Authors:
Chang Liu,
Han Yu,
Boyang Li,
Zhiqi Shen,
Zhanning Gao,
Peiran Ren,
Xuansong Xie,
Lizhen Cui,
Chunyan Miao
Abstract:
The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving robustness to noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains open. In this paper, we propose a noise-resistant training technique for DML, which we name Probabilistic Ranking…
▽ More
The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving robustness to noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains open. In this paper, we propose a noise-resistant training technique for DML, which we name Probabilistic Ranking-based Instance Selection with Memory (PRISM). PRISM identifies noisy data in a minibatch using average similarity against image features extracted by several previous versions of the neural network. These features are stored in and retrieved from a memory bank. To alleviate the high computational cost brought by the memory bank, we introduce an acceleration method that replaces individual data points with the class centers. In extensive comparisons with 12 existing approaches under both synthetic and real-world label noise, PRISM demonstrates superior performance of up to 6.06% in Precision@1.
△ Less
Submitted 12 April, 2021; v1 submitted 29 March, 2021;
originally announced March 2021.
-
AR Mapping: Accurate and Efficient Mapping for Augmented Reality
Authors:
Rui Huang,
Chuan Fang,
Kejie Qiu,
Le Cui,
Zilong Dong,
Siyu Zhu,
Ping Tan
Abstract:
Augmented reality (AR) has gained increasingly attention from both research and industry communities. By overlaying digital information and content onto the physical world, AR enables users to experience the world in a more informative and efficient manner. As a major building block for AR systems, localization aims at determining the device's pose from a pre-built "map" consisting of visual and d…
▽ More
Augmented reality (AR) has gained increasingly attention from both research and industry communities. By overlaying digital information and content onto the physical world, AR enables users to experience the world in a more informative and efficient manner. As a major building block for AR systems, localization aims at determining the device's pose from a pre-built "map" consisting of visual and depth information in a known environment. While the localization problem has been widely studied in the literature, the "map" for AR systems is rarely discussed. In this paper, we introduce the AR Map for a specific scene to be composed of 1) color images with 6-DOF poses; 2) dense depth maps for each image and 3) a complete point cloud map. We then propose an efficient end-to-end solution to generating and evaluating AR Maps. Firstly, for efficient data capture, a backpack scanning device is presented with a unified calibration pipeline. Secondly, we propose an AR mapping pipeline which takes the input from the scanning device and produces accurate AR Maps. Finally, we present an approach to evaluating the accuracy of AR Maps with the help of the highly accurate reconstruction result from a high-end laser scanner. To the best of our knowledge, it is the first time to present an end-to-end solution to efficient and accurate mapping for AR applications.
△ Less
Submitted 27 March, 2021;
originally announced March 2021.
-
Compact 3D Map-Based Monocular Localization Using Semantic Edge Alignment
Authors:
Kejie Qiu,
Shenzhou Chen,
Jiahui Zhang,
Rui Huang,
Le Cui,
Siyu Zhu,
Ping Tan
Abstract:
Accurate localization is fundamental to a variety of applications, such as navigation, robotics, autonomous driving, and Augmented Reality (AR). Different from incremental localization, global localization has no drift caused by error accumulation, which is desired in many application scenarios. In addition to GPS used in the open air, 3D maps are also widely used as alternative global localizatio…
▽ More
Accurate localization is fundamental to a variety of applications, such as navigation, robotics, autonomous driving, and Augmented Reality (AR). Different from incremental localization, global localization has no drift caused by error accumulation, which is desired in many application scenarios. In addition to GPS used in the open air, 3D maps are also widely used as alternative global localization references. In this paper, we propose a compact 3D map-based global localization system using a low-cost monocular camera and an IMU (Inertial Measurement Unit). The proposed compact map consists of two types of simplified elements with multiple semantic labels, which is well adaptive to various man-made environments like urban environments. Also, semantic edge features are used for the key image-map registration, which is robust against occlusion and long-term appearance changes in the environments. To further improve the localization performance, the key semantic edge alignment is formulated as an optimization problem based on initial poses predicted by an independent VIO (Visual-Inertial Odometry) module. The localization system is realized with modular design in real time. We evaluate the localization accuracy through real-world experimental results compared with ground truth, long-term localization performance is also demonstrated.
△ Less
Submitted 27 March, 2021;
originally announced March 2021.
-
Towards Personalized Federated Learning
Authors:
Alysa Ziying Tan,
Han Yu,
Lizhen Cui,
Qiang Yang
Abstract:
In parallel with the rapid adoption of Artificial Intelligence (AI) empowered by advances in AI research, there have been growing awareness and concerns of data privacy. Recent significant developments in the data regulation landscape have prompted a seismic shift in interest towards privacy-preserving AI. This has contributed to the popularity of Federated Learning (FL), the leading paradigm for…
▽ More
In parallel with the rapid adoption of Artificial Intelligence (AI) empowered by advances in AI research, there have been growing awareness and concerns of data privacy. Recent significant developments in the data regulation landscape have prompted a seismic shift in interest towards privacy-preserving AI. This has contributed to the popularity of Federated Learning (FL), the leading paradigm for the training of machine learning models on data silos in a privacy-preserving manner. In this survey, we explore the domain of Personalized FL (PFL) to address the fundamental challenges of FL on heterogeneous data, a universal characteristic inherent in all real-world datasets. We analyze the key motivations for PFL and present a unique taxonomy of PFL techniques categorized according to the key challenges and personalization strategies in PFL. We highlight their key ideas, challenges and opportunities and envision promising future trajectories of research towards new PFL architectural design, realistic PFL benchmarking, and trustworthy PFL approaches.
△ Less
Submitted 17 March, 2022; v1 submitted 28 February, 2021;
originally announced March 2021.
-
Graph Embedding for Recommendation against Attribute Inference Attacks
Authors:
Shijie Zhang,
Hongzhi Yin,
Tong Chen,
Zi Huang,
Lizhen Cui,
Xiangliang Zhang
Abstract:
In recent years, recommender systems play a pivotal role in helping users identify the most suitable items that satisfy personal preferences. As user-item interactions can be naturally modelled as graph-structured data, variants of graph convolutional networks (GCNs) have become a well-established building block in the latest recommenders. Due to the wide utilization of sensitive user profile data…
▽ More
In recent years, recommender systems play a pivotal role in helping users identify the most suitable items that satisfy personal preferences. As user-item interactions can be naturally modelled as graph-structured data, variants of graph convolutional networks (GCNs) have become a well-established building block in the latest recommenders. Due to the wide utilization of sensitive user profile data, existing recommendation paradigms are likely to expose users to the threat of privacy breach, and GCN-based recommenders are no exception. Apart from the leakage of raw user data, the fragility of current recommenders under inference attacks offers malicious attackers a backdoor to estimate users' private attributes via their behavioral footprints and the recommendation results. However, little attention has been paid to developing recommender systems that can defend such attribute inference attacks, and existing works achieve attack resistance by either sacrificing considerable recommendation accuracy or only covering specific attack models or protected information. In our paper, we propose GERAI, a novel differentially private graph convolutional network to address such limitations. Specifically, in GERAI, we bind the information perturbation mechanism in differential privacy with the recommendation capability of graph convolutional networks. Furthermore, based on local differential privacy and functional mechanism, we innovatively devise a dual-stage encryption paradigm to simultaneously enforce privacy guarantee on users' sensitive features and the model optimization process. Extensive experiments show the superiority of GERAI in terms of its resistance to attribute inference attacks and recommendation effectiveness.
△ Less
Submitted 29 January, 2021;
originally announced January 2021.
-
FENet: A Frequency Extraction Network for Obstructive Sleep Apnea Detection
Authors:
Guanhua Ye,
Hongzhi Yin,
Tong Chen,
Hongxu Chen,
Lizhen Cui,
Xiangliang Zhang
Abstract:
Obstructive Sleep Apnea (OSA) is a highly prevalent but inconspicuous disease that seriously jeopardizes the health of human beings. Polysomnography (PSG), the gold standard of detecting OSA, requires multiple specialized sensors for signal collection, hence patients have to physically visit hospitals and bear the costly treatment for a single detection. Recently, many single-sensor alternatives h…
▽ More
Obstructive Sleep Apnea (OSA) is a highly prevalent but inconspicuous disease that seriously jeopardizes the health of human beings. Polysomnography (PSG), the gold standard of detecting OSA, requires multiple specialized sensors for signal collection, hence patients have to physically visit hospitals and bear the costly treatment for a single detection. Recently, many single-sensor alternatives have been proposed to improve the cost efficiency and convenience. Among these methods, solutions based on RR-interval (i.e., the interval between two consecutive pulses) signals reach a satisfactory balance among comfort, portability and detection accuracy. In this paper, we advance RR-interval based OSA detection by considering its real-world practicality from energy perspectives. As photoplethysmogram (PPG) pulse sensors are commonly equipped on smart wrist-worn wearable devices (e.g., smart watches and wristbands), the energy efficiency of the detection model is crucial to fully support an overnight observation on patients. This creates challenges as the PPG sensors are unable to keep collecting continuous signals due to the limited battery capacity on smart wrist-worn devices. Therefore, we propose a novel Frequency Extraction Network (FENet), which can extract features from different frequency bands of the input RR-interval signals and generate continuous detection results with downsampled, discontinuous RR-interval signals. With the help of the one-to-multiple structure, FENet requires only one-third of the operation time of the PPG sensor, thus sharply cutting down the energy consumption and enabling overnight diagnosis. Experimental results on real OSA datasets reveal the state-of-the-art performance of FENet.
△ Less
Submitted 8 January, 2021;
originally announced January 2021.
-
Analysing the radio flux density profile of the M31 galaxy: a possible dark matter interpretation
Authors:
Man Ho Chan,
Chu Fai Yeung,
Lang Cui,
Chun Sing Leung
Abstract:
Some recent studies have examined the gamma-ray flux profile of our Galaxy to determine the signal of dark matter annihilation. However, the results are controversial and no confirmation is obtained. In this article, we study the radio flux density profile of the M31 galaxy and show that it could manifest a possible signal of dark matter annihilation. By comparing the likelihoods between the archi…
▽ More
Some recent studies have examined the gamma-ray flux profile of our Galaxy to determine the signal of dark matter annihilation. However, the results are controversial and no confirmation is obtained. In this article, we study the radio flux density profile of the M31 galaxy and show that it could manifest a possible signal of dark matter annihilation. By comparing the likelihoods between the archival observed radio flux density profile data and the predicted radio flux density profile contributed by dark matter and stellar emission, we can constrain the relevant dark matter parameters. Specifically, for the thermal annihilation cross section via the $b\bar{b}$ channel, the best-fit value of dark matter mass is $\sim 30$ GeV, which is consistent with the results of many recent studies. We expect that this method would become another useful way to constrain dark matter, which is complementary to the traditional radio analyses and the other indirect detections.
△ Less
Submitted 2 January, 2021;
originally announced January 2021.
-
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Authors:
Yang Xu,
Yiheng Xu,
Tengchao Lv,
Lei Cui,
Furu Wei,
Guoxin Wang,
Yijuan Lu,
Dinei Florencio,
Cha Zhang,
Wanxiang Che,
Min Zhang,
Lidong Zhou
Abstract:
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents. We propose LayoutLMv2 architecture with new pre-training tasks to model the interaction among text, layout, and image in a single multi-modal framework. Specifically, with a…
▽ More
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents. We propose LayoutLMv2 architecture with new pre-training tasks to model the interaction among text, layout, and image in a single multi-modal framework. Specifically, with a two-stream multi-modal Transformer encoder, LayoutLMv2 uses not only the existing masked visual-language modeling task but also the new text-image alignment and text-image matching tasks, which make it better capture the cross-modality interaction in the pre-training stage. Meanwhile, it also integrates a spatial-aware self-attention mechanism into the Transformer architecture so that the model can fully understand the relative positional relationship among different text blocks. Experiment results show that LayoutLMv2 outperforms LayoutLM by a large margin and achieves new state-of-the-art results on a wide variety of downstream visually-rich document understanding tasks, including FUNSD (0.7895 $\to$ 0.8420), CORD (0.9493 $\to$ 0.9601), SROIE (0.9524 $\to$ 0.9781), Kleister-NDA (0.8340 $\to$ 0.8520), RVL-CDIP (0.9443 $\to$ 0.9564), and DocVQA (0.7295 $\to$ 0.8672). We made our model and code publicly available at \url{https://aka.ms/layoutlmv2}.
△ Less
Submitted 9 January, 2022; v1 submitted 29 December, 2020;
originally announced December 2020.
-
Self-Supervised Hypergraph Convolutional Networks for Session-based Recommendation
Authors:
Xin Xia,
Hongzhi Yin,
Junliang Yu,
Qinyong Wang,
Lizhen Cui,
Xiangliang Zhang
Abstract:
Session-based recommendation (SBR) focuses on next-item prediction at a certain time point. As user profiles are generally not available in this scenario, capturing the user intent lying in the item transitions plays a pivotal role. Recent graph neural networks (GNNs) based SBR methods regard the item transitions as pairwise relations, which neglect the complex high-order information among items.…
▽ More
Session-based recommendation (SBR) focuses on next-item prediction at a certain time point. As user profiles are generally not available in this scenario, capturing the user intent lying in the item transitions plays a pivotal role. Recent graph neural networks (GNNs) based SBR methods regard the item transitions as pairwise relations, which neglect the complex high-order information among items. Hypergraph provides a natural way to capture beyond-pairwise relations, while its potential for SBR has remained unexplored. In this paper, we fill this gap by modeling session-based data as a hypergraph and then propose a hypergraph convolutional network to improve SBR. Moreover, to enhance hypergraph modeling, we devise another graph convolutional network which is based on the line graph of the hypergraph and then integrate self-supervised learning into the training of the networks by maximizing mutual information between the session representations learned via the two networks, serving as an auxiliary task to improve the recommendation task. Since the two types of networks both are based on hypergraph, which can be seen as two channels for hypergraph modeling, we name our model \textbf{DHCN} (Dual Channel Hypergraph Convolutional Networks). Extensive experiments on three benchmark datasets demonstrate the superiority of our model over the SOTA methods, and the results validate the effectiveness of hypergraph modeling and self-supervised task. The implementation of our model is available at https://github.com/xiaxin1998/DHCN
△ Less
Submitted 28 February, 2022; v1 submitted 12 December, 2020;
originally announced December 2020.
-
Natural Language Inference in Context -- Investigating Contextual Reasoning over Long Texts
Authors:
Hanmeng Liu,
Leyang Cui,
Jian Liu,
Yue Zhang
Abstract:
Natural language inference (NLI) is a fundamental NLP task, investigating the entailment relationship between two texts. Popular NLI datasets present the task at sentence-level. While adequate for testing semantic representations, they fall short for testing contextual reasoning over long texts, which is a natural part of the human inference process. We introduce ConTRoL, a new dataset for ConText…
▽ More
Natural language inference (NLI) is a fundamental NLP task, investigating the entailment relationship between two texts. Popular NLI datasets present the task at sentence-level. While adequate for testing semantic representations, they fall short for testing contextual reasoning over long texts, which is a natural part of the human inference process. We introduce ConTRoL, a new dataset for ConTextual Reasoning over Long texts. Consisting of 8,325 expert-designed "context-hypothesis" pairs with gold labels, ConTRoL is a passage-level NLI dataset with a focus on complex contextual reasoning types such as logical reasoning. It is derived from competitive selection and recruitment test (verbal reasoning test) for police recruitment, with expert level quality. Compared with previous NLI benchmarks, the materials in ConTRoL are much more challenging, involving a range of reasoning types. Empirical results show that state-of-the-art language models perform by far worse than educated humans. Our dataset can also serve as a testing-set for downstream tasks like Checking Factual Correctness of Summaries.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Numerical analysis of the strain distribution in skin domes formed upon the application of hypobaric pressure
Authors:
Daniel Sebastia-Saez,
Faiza Benaouda,
Charlie Lim,
Guoping Lian,
Stuart Jones,
Tao Chen,
Liang Cui
Abstract:
Suction cups are widely used in applications such as in measurement of mechanical properties of skin in vivo, in drug delivery devices or in acupuncture treatment. Understanding the mechanical response of skin under hypobaric pressure are of great importance for users of suction cups. The aims of this work are to assess the capability of linear elasticity (Young's modulus) or hyperelasticity in pr…
▽ More
Suction cups are widely used in applications such as in measurement of mechanical properties of skin in vivo, in drug delivery devices or in acupuncture treatment. Understanding the mechanical response of skin under hypobaric pressure are of great importance for users of suction cups. The aims of this work are to assess the capability of linear elasticity (Young's modulus) or hyperelasticity in predicting hypobaric pressure induced 3D stretching of the skin. Using experiments and computational Finite Element Method modelling, this work demonstrated that although it was possible to predict the suction dome apex height using both linear elasticity and hyperelasticity for the typical range of hypobaric pressure in medical applications (up to -10 psi), linear elasticity theory showed limitations when predicting the strain distribution across the suction dome. The reason is that the stretch ratio reaches values exceeding the initial linear elastic stage of the stress-strain characteristic curve for skin. As a result, the linear elasticity theory overpredicts the stretch along the rim of domes where there is stress concentration. In addition, the modelling showed that the skin was compressed consistently along the thickness direction, leading to reduced thickness. Using hyperelasticity modelling to predict the 3D strain distribution paves the way to accurately design safe commercial products that interface with skin.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
Commonsense knowledge adversarial dataset that challenges ELECTRA
Authors:
Gongqi Lin,
Yuan Miao,
Xiaoyong Yang,
Wenwu Ou,
Lizhen Cui,
Wei Guo,
Chunyan Miao
Abstract:
Commonsense knowledge is critical in human reading comprehension. While machine comprehension has made significant progress in recent years, the ability in handling commonsense knowledge remains limited. Synonyms are one of the most widely used commonsense knowledge. Constructing adversarial dataset is an important approach to find weak points of machine comprehension models and support the design…
▽ More
Commonsense knowledge is critical in human reading comprehension. While machine comprehension has made significant progress in recent years, the ability in handling commonsense knowledge remains limited. Synonyms are one of the most widely used commonsense knowledge. Constructing adversarial dataset is an important approach to find weak points of machine comprehension models and support the design of solutions. To investigate machine comprehension models' ability in handling the commonsense knowledge, we created a Question and Answer Dataset with common knowledge of Synonyms (QADS). QADS are questions generated based on SQuAD 2.0 by applying commonsense knowledge of synonyms. The synonyms are extracted from WordNet. Words often have multiple meanings and synonyms. We used an enhanced Lesk algorithm to perform word sense disambiguation to identify synonyms for the context. ELECTRA achieves the state-of-art result on the SQuAD 2.0 dataset in 2019. With scale, ELECTRA can achieve similar performance as BERT does. However, QADS shows that ELECTRA has little ability to handle commonsense knowledge of synonyms. In our experiment, ELECTRA-small can achieve 70% accuracy on SQuAD 2.0, but only 20% on QADS. ELECTRA-large did not perform much better. Its accuracy on SQuAD 2.0 is 88% but dropped significantly to 26% on QADS. In our earlier experiments, BERT, although also failed badly on QADS, was not as bad as ELECTRA. The result shows that even top-performing NLP models have little ability to handle commonsense knowledge which is essential in reading comprehension.
△ Less
Submitted 25 October, 2020;
originally announced October 2020.
-
Resolving the inner jet of PKS 1749+096 with super-resolution VLBA images at 7 mm
Authors:
Lang Cui,
Ru-Sen Lu,
Wei Yu,
Jun Liu,
Víctor Patiño-Álvarez,
Qi Yuan
Abstract:
High resolution imaging of inner jets in Active Galactic Nuclei (AGNs) with VLBI at millimeter wavelengths provides deep insight into the launching and collimation mechanisms of relativistic jets. The BL Lac object, PKS 1749+096, shows a core-dominated jet pointing toward the northeast on parsec-scales revealed by various VLBI observations. In order to investigate the jet kinematics, in particular…
▽ More
High resolution imaging of inner jets in Active Galactic Nuclei (AGNs) with VLBI at millimeter wavelengths provides deep insight into the launching and collimation mechanisms of relativistic jets. The BL Lac object, PKS 1749+096, shows a core-dominated jet pointing toward the northeast on parsec-scales revealed by various VLBI observations. In order to investigate the jet kinematics, in particular, the orientation of the inner jet on the smallest accessible scales and the basic physical conditions of the core, in this work we adopted a super-resolution technique, the Bi-Spectrum Maximum Entropy Method (BSMEM), to reanalyze VLBI images based on the Very Long Baseline Array (VLBA) observations of PKS 1749+096 within the VLBA-BU-BLAZAR 7 mm monitoring program. These observations include a total of 105 epochs covering the period from 2009 to 2019. We found that the stacked image of the inner jet is limb-brightened with an apparent opening angle of 50°.0 +/- 8°.0 and 42°.0 +/- 6°.0 at the distance of 0.2 and 0.3 mas (0.9 and 1.4 pc) from the core, corresponding to an intrinsic jet opening angle of 5°.2 +/- 1°.0 and 4°.3 +/- 0°.7, respectively. In addition, our images show a clear jet position angle swing in PKS 1749+096 within the last ten years. We discuss the possible implications of jet limb brightening and the connection of the position angle with jet peak flux density and gamma-ray brightness.
△ Less
Submitted 26 October, 2020; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Does Chinese BERT Encode Word Structure?
Authors:
Yile Wang,
Leyang Cui,
Yue Zhang
Abstract:
Contextualized representations give significantly improved results for a wide range of NLP tasks. Much work has been dedicated to analyzing the features captured by representative models such as BERT. Existing work finds that syntactic, semantic and word sense knowledge are encoded in BERT. However, little work has investigated word features for character-based languages such as Chinese. We invest…
▽ More
Contextualized representations give significantly improved results for a wide range of NLP tasks. Much work has been dedicated to analyzing the features captured by representative models such as BERT. Existing work finds that syntactic, semantic and word sense knowledge are encoded in BERT. However, little work has investigated word features for character-based languages such as Chinese. We investigate Chinese BERT using both attention weight distribution statistics and probing tasks, finding that (1) word information is captured by BERT; (2) word-level features are mostly in the middle representation layers; (3) downstream tasks make different use of word features in BERT, with POS tagging and chunking relying the most on word features, and natural language inference relying the least on such features.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks
Authors:
Yue Wang,
Zhuo Xu,
Lu Bai,
Yao Wan,
Lixin Cui,
Qian Zhao,
Edwin R. Hancock,
Philip S. Yu
Abstract:
Joint-event-extraction, which extracts structural information (i.e., entities or triggers of events) from unstructured real-world corpora, has attracted more and more research attention in natural language processing. Most existing works do not fully address the sparse co-occurrence relationships between entities and triggers, which loses this important information and thus deteriorates the extrac…
▽ More
Joint-event-extraction, which extracts structural information (i.e., entities or triggers of events) from unstructured real-world corpora, has attracted more and more research attention in natural language processing. Most existing works do not fully address the sparse co-occurrence relationships between entities and triggers, which loses this important information and thus deteriorates the extraction performance. To mitigate this issue, we first define the joint-event-extraction as a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities. Then, to incorporate the missing information in the aforementioned co-occurrence relationships, we propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of either triggers or entities based on the type distribution of each other. Moreover, since the connected entities and triggers naturally form a heterogeneous information network (HIN), we leverage the latent pattern along meta-paths for a given corpus to further improve the performance of our proposed method. To verify the effectiveness of our proposed method, we conduct extensive experiments on four real-world datasets as well as compare our method with state-of-the-art methods. Empirical results and analysis show that our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
△ Less
Submitted 13 October, 2020; v1 submitted 13 October, 2020;
originally announced October 2020.
-
What Have We Achieved on Text Summarization?
Authors:
Dandan Huang,
Leyang Cui,
Sen Yang,
Guangsheng Bao,
Kun Wang,
Jun Xie,
Yue Zhang
Abstract:
Deep learning has led to significant improvement in text summarization with various methods investigated and improved ROUGE scores reported over the years. However, gaps still exist between summaries produced by automatic summarizers and human professionals. Aiming to gain more understanding of summarization systems with respect to their strengths and limits on a fine-grained syntactic and semanti…
▽ More
Deep learning has led to significant improvement in text summarization with various methods investigated and improved ROUGE scores reported over the years. However, gaps still exist between summaries produced by automatic summarizers and human professionals. Aiming to gain more understanding of summarization systems with respect to their strengths and limits on a fine-grained syntactic and semantic level, we consult the Multidimensional Quality Metric(MQM) and quantify 8 major sources of errors on 10 representative summarization models manually. Primarily, we find that 1) under similar settings, extractive summarizers are in general better than their abstractive counterparts thanks to strength in faithfulness and factual-consistency; 2) milestone techniques such as copy, coverage and hybrid extractive/abstractive methods do bring specific improvements but also demonstrate limitations; 3) pre-training techniques, and in particular sequence-to-sequence pre-training, are highly effective for improving text summarization, with BART giving the best results.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Background Learnable Cascade for Zero-Shot Object Detection
Authors:
Ye Zheng,
Ruoran Huang,
Chuanqi Han,
Xi Huang,
Li Cui
Abstract:
Zero-shot detection (ZSD) is crucial to large-scale object detection with the aim of simultaneously localizing and recognizing unseen objects. There remain several challenges for ZSD, including reducing the ambiguity between background and unseen objects as well as improving the alignment between visual and semantic concept. In this work, we propose a novel framework named Background Learnable Cas…
▽ More
Zero-shot detection (ZSD) is crucial to large-scale object detection with the aim of simultaneously localizing and recognizing unseen objects. There remain several challenges for ZSD, including reducing the ambiguity between background and unseen objects as well as improving the alignment between visual and semantic concept. In this work, we propose a novel framework named Background Learnable Cascade (BLC) to improve ZSD performance. The major contributions for BLC are as follows: (i) we propose a multi-stage cascade structure named Cascade Semantic R-CNN to progressively refine the alignment between visual and semantic of ZSD; (ii) we develop the semantic information flow structure and directly add it between each stage in Cascade Semantic RCNN to further improve the semantic feature learning; (iii) we propose the background learnable region proposal network (BLRPN) to learn an appropriate word vector for background class and use this learned vector in Cascade Semantic R CNN, this design makes \Background Learnable" and reduces the confusion between background and unseen classes. Our extensive experiments show BLC obtains significantly performance improvements for MS-COCO over state-of-the-art methods.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
A simple and efficient kinetic model for wealth distribution with saving propensity effect: based on lattice gas automaton
Authors:
Lijie Cui,
Chuandong Lin
Abstract:
The dynamics of wealth distribution plays a critical role in the economic market, hence an understanding of its nonequilibrium statistical mechanics is of great importance to human society. For this aim, a simple and efficient one-dimensional (1D) lattice gas automaton (LGA) is presented for wealth distribution of agents with or without saving propensity. The LGA comprises two stages, i.e., random…
▽ More
The dynamics of wealth distribution plays a critical role in the economic market, hence an understanding of its nonequilibrium statistical mechanics is of great importance to human society. For this aim, a simple and efficient one-dimensional (1D) lattice gas automaton (LGA) is presented for wealth distribution of agents with or without saving propensity. The LGA comprises two stages, i.e., random propagation and economic transaction. During the former phase, an agent either remains motionless or travels to one of its neighboring empty sites with a certain probability. In the subsequent procedure, an economic transaction takes place between a pair of neighboring agents randomly. It requires at least 4 neighbors to present correct simulation results. The LGA reduces to the simplest model with only random economic transaction if all agents are neighbors and no empty sites exist. The 1D-LGA has a higher computational efficiency than the 2D-LGA and the famous Chakraborti-Chakrabarti economic model. Finally, the LGA is validated with two benchmarks, i.e., the wealth distributions of individual agents and dual-earner families. With the increasing saving fraction, both the Gini coefficient and Kolkata index (for individual agents or two-earner families) reduce, while the deviation degree (defined to measure the difference between the probability distributions with and without saving propensities) increases. It is demonstrated that the wealth distribution is changed significantly by the saving propensity which alleviates wealth inequality.
△ Less
Submitted 11 September, 2020; v1 submitted 12 August, 2020;
originally announced August 2020.
-
Engineering the spectral profile of photon pairs by using multi-stage nonlinear interferometers
Authors:
Mingyi Ma,
Liang Cui,
Xiaoying Li
Abstract:
Using the quantum interference of photon pairs in N-stage nonlinear interferometers (NLI), the contour of joint spectral function can be modified into islands pattern. We perform two series of experiments. One is that all the nonlinear fibers in pulse pumped NLI are identical; the other is that the lengths of N pieces nonlinear fibers are different. We not only demonstrate how the pattern of spect…
▽ More
Using the quantum interference of photon pairs in N-stage nonlinear interferometers (NLI), the contour of joint spectral function can be modified into islands pattern. We perform two series of experiments. One is that all the nonlinear fibers in pulse pumped NLI are identical; the other is that the lengths of N pieces nonlinear fibers are different. We not only demonstrate how the pattern of spectral function changes with the stage number N, but also characterize how the relative intensity of island peaks varies with N. The results, well agree with theoretical predictions in Ref. [1], reveal that the NLI with N pieces nonlinear fibers following binomial distribution can provide a better active filtering function. Our investigation shows that the active filtering effect of multi-stage NLI is a useful tool for efficiently engineering the factorable two-photon state - a desirable resource for quantum information processing.
△ Less
Submitted 14 October, 2020; v1 submitted 10 August, 2020;
originally announced August 2020.
-
On Commonsense Cues in BERT for Solving Commonsense Tasks
Authors:
Leyang Cui,
Sijie Cheng,
Yu Wu,
Yue Zhang
Abstract:
BERT has been used for solving commonsense tasks such as CommonsenseQA. While prior research has found that BERT does contain commonsense information to some extent, there has been work showing that pre-trained models can rely on spurious associations (e.g., data bias) rather than key cues in solving sentiment classification and other problems. We quantitatively investigate the presence of structu…
▽ More
BERT has been used for solving commonsense tasks such as CommonsenseQA. While prior research has found that BERT does contain commonsense information to some extent, there has been work showing that pre-trained models can rely on spurious associations (e.g., data bias) rather than key cues in solving sentiment classification and other problems. We quantitatively investigate the presence of structural commonsense cues in BERT when solving commonsense tasks, and the importance of such cues for the model prediction. Using two different measures, we find that BERT does use relevant knowledge for solving the task, and the presence of commonsense knowledge is positively correlated to the model accuracy.
△ Less
Submitted 15 June, 2021; v1 submitted 10 August, 2020;
originally announced August 2020.
-
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning
Authors:
Jian Liu,
Leyang Cui,
Hanmeng Liu,
Dandan Huang,
Yile Wang,
Yue Zhang
Abstract:
Machine reading is a fundamental task for testing the capability of natural language understanding, which is closely related to human cognition in many aspects. With the rising of deep learning techniques, algorithmic models rival human performances on simple QA, and thus increasingly challenging machine reading datasets have been proposed. Though various challenges such as evidence integration an…
▽ More
Machine reading is a fundamental task for testing the capability of natural language understanding, which is closely related to human cognition in many aspects. With the rising of deep learning techniques, algorithmic models rival human performances on simple QA, and thus increasingly challenging machine reading datasets have been proposed. Though various challenges such as evidence integration and commonsense knowledge have been integrated, one of the fundamental capabilities in human reading, namely logical reasoning, is not fully investigated. We build a comprehensive dataset, named LogiQA, which is sourced from expert-written questions for testing human Logical reasoning. It consists of 8,678 QA instances, covering multiple types of deductive reasoning. Results show that state-of-the-art neural models perform by far worse than human ceiling. Our dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting. The dataset is freely available at https://github.com/lgw863/LogiQA-dataset
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
DocBank: A Benchmark Dataset for Document Layout Analysis
Authors:
Minghao Li,
Yiheng Xu,
Lei Cui,
Shaohan Huang,
Furu Wei,
Zhoujun Li,
Ming Zhou
Abstract:
Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are still insufficient. In this paper, we present \textbf{DocBank}, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for…
▽ More
Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are still insufficient. In this paper, we present \textbf{DocBank}, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis. DocBank is constructed using a simple yet effective way with weak supervision from the \LaTeX{} documents available on the arXiv.com. With DocBank, models from different modalities can be compared fairly and multi-modal approaches will be further investigated and boost the performance of document layout analysis. We build several strong baselines and manually split train/dev/test sets for evaluation. Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents. The DocBank dataset is publicly available at \url{https://github.com/doc-analysis/DocBank}.
△ Less
Submitted 11 November, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
CoAID: COVID-19 Healthcare Misinformation Dataset
Authors:
Limeng Cui,
Dongwon Lee
Abstract:
As the COVID-19 virus quickly spreads around the world, unfortunately, misinformation related to COVID-19 also gets created and spreads like wild fire. Such misinformation has caused confusion among people, disruptions in society, and even deadly consequences in health problems. To be able to understand, detect, and mitigate such COVID-19 misinformation, therefore, has not only deep intellectual v…
▽ More
As the COVID-19 virus quickly spreads around the world, unfortunately, misinformation related to COVID-19 also gets created and spreads like wild fire. Such misinformation has caused confusion among people, disruptions in society, and even deadly consequences in health problems. To be able to understand, detect, and mitigate such COVID-19 misinformation, therefore, has not only deep intellectual values but also huge societal impacts. To help researchers combat COVID-19 health misinformation, therefore, we present CoAID (Covid-19 heAlthcare mIsinformation Dataset), with diverse COVID-19 healthcare misinformation, including fake news on websites and social platforms, along with users' social engagement about such news. CoAID includes 4,251 news, 296,000 related user engagements, 926 social platform posts about COVID-19, and ground truth labels. The dataset is available at: https://github.com/cuilimeng/CoAID.
△ Less
Submitted 3 November, 2020; v1 submitted 22 May, 2020;
originally announced June 2020.
-
GCN-Based User Representation Learning for Unifying Robust Recommendation and Fraudster Detection
Authors:
Shijie Zhang,
Hongzhi Yin,
Tong Chen,
Quoc Viet Nguyen Hung,
Zi Huang,
Lizhen Cui
Abstract:
In recent years, recommender system has become an indispensable function in all e-commerce platforms. The review rating data for a recommender system typically comes from open platforms, which may attract a group of malicious users to deliberately insert fake feedback in an attempt to bias the recommender system to their favour. The presence of such attacks may violate modeling assumptions that hi…
▽ More
In recent years, recommender system has become an indispensable function in all e-commerce platforms. The review rating data for a recommender system typically comes from open platforms, which may attract a group of malicious users to deliberately insert fake feedback in an attempt to bias the recommender system to their favour. The presence of such attacks may violate modeling assumptions that high-quality data is always available and these data truly reflect users' interests and preferences. Therefore, it is of great practical significance to construct a robust recommender system that is able to generate stable recommendations even in the presence of shilling attacks. In this paper, we propose GraphRfi - a GCN-based user representation learning framework to perform robust recommendation and fraudster detection in a unified way. In its end-to-end learning process, the probability of a user being identified as a fraudster in the fraudster detection component automatically determines the contribution of this user's rating data in the recommendation component; while the prediction error outputted in the recommendation component acts as an important feature in the fraudster detection component. Thus, these two components can mutually enhance each other. Extensive experiments have been conducted and the experimental results show the superiority of our GraphRfi in the two tasks - robust rating prediction and fraudster detection. Furthermore, the proposed GraphRfi is validated to be more robust to the various types of shilling attacks over the state-of-the-art recommender systems.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
A practical Response Adaptive Block Randomization (RABR) design with analytic type I error protection
Authors:
Tianyu Zhan,
Lu Cui,
Ziqian Geng,
Lanju Zhang,
Yihua Gu,
Ivan S. F. Chan
Abstract:
Response adaptive randomization (RAR) is appealing from methodological, ethical, and pragmatic perspectives in the sense that subjects are more likely to be randomized to better performing treatment groups based on accumulating data. However, applications of RAR in confirmatory drug clinical trials with multiple active arms are limited largely due to its complexity, and lack of control of randomiz…
▽ More
Response adaptive randomization (RAR) is appealing from methodological, ethical, and pragmatic perspectives in the sense that subjects are more likely to be randomized to better performing treatment groups based on accumulating data. However, applications of RAR in confirmatory drug clinical trials with multiple active arms are limited largely due to its complexity, and lack of control of randomization ratios to different treatment groups. To address the aforementioned issues, we propose a Response Adaptive Block Randomization (RABR) design allowing arbitrarily pre-specified randomization ratios for the control and high-performing groups to meet clinical trial objectives. We show the validity of the conventional unweighted test in RABR with a controlled type I error rate based on the weighted combination test for sample size adaptive design invoking no large sample approximation. The advantages of the proposed RABR in terms of robustly reaching target final sample size to meet regulatory requirements and increasing statistical power as compared with the popular Doubly Adaptive Biased Coin Design (DBCD) are demonstrated by statistical simulations and a practical clinical trial design example.
△ Less
Submitted 1 August, 2022; v1 submitted 15 April, 2020;
originally announced April 2020.
-
MuTual: A Dataset for Multi-Turn Dialogue Reasoning
Authors:
Leyang Cui,
Yu Wu,
Shujie Liu,
Yue Zhang,
Ming Zhou
Abstract:
Non-task oriented dialogue systems have achieved great success in recent years due to largely accessible conversation data and the development of deep learning techniques. Given a context, current systems are able to yield a relevant and fluent response, but sometimes make logical mistakes because of weak reasoning capabilities. To facilitate the conversation reasoning research, we introduce MuTua…
▽ More
Non-task oriented dialogue systems have achieved great success in recent years due to largely accessible conversation data and the development of deep learning techniques. Given a context, current systems are able to yield a relevant and fluent response, but sometimes make logical mistakes because of weak reasoning capabilities. To facilitate the conversation reasoning research, we introduce MuTual, a novel dataset for Multi-Turn dialogue Reasoning, consisting of 8,860 manually annotated dialogues based on Chinese student English listening comprehension exams. Compared to previous benchmarks for non-task oriented dialogue systems, MuTual is much more challenging since it requires a model that can handle various reasoning problems. Empirical results show that state-of-the-art methods only reach 71%, which is far behind the human performance of 94%, indicating that there is ample room for improving reasoning ability. MuTual is available at https://github.com/Nealcly/MuTual.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
Enhancing Social Recommendation with Adversarial Graph Convolutional Networks
Authors:
Junliang Yu,
Hongzhi Yin,
Jundong Li,
Min Gao,
Zi Huang,
Lizhen Cui
Abstract:
Social recommender systems are expected to improve recommendation quality by incorporating social information when there is little user-item interaction data. However, recent reports from industry show that social recommender systems consistently fail in practice. According to the negative findings, the failure is attributed to: (1) A majority of users only have a very limited number of neighbors…
▽ More
Social recommender systems are expected to improve recommendation quality by incorporating social information when there is little user-item interaction data. However, recent reports from industry show that social recommender systems consistently fail in practice. According to the negative findings, the failure is attributed to: (1) A majority of users only have a very limited number of neighbors in social networks and can hardly benefit from social relations; (2) Social relations are noisy but they are indiscriminately used; (3) Social relations are assumed to be universally applicable to multiple scenarios while they are actually multi-faceted and show heterogeneous strengths in different scenarios. Most existing social recommendation models only consider the homophily in social networks and neglect these drawbacks. In this paper we propose a deep adversarial framework based on graph convolutional networks (GCN) to address these problems. Concretely, for (1) and (2), a GCN-based autoencoder is developed to augment the relation data by encoding high-order and complex connectivity patterns, and meanwhile is optimized subject to the constraint of reconstructing the social profile to guarantee the validity of the identified neighborhood. After obtaining enough purified social relations for each user, a GCN-based attentive social recommendation module is designed to address (3) by capturing the heterogeneous strengths of social relations. Finally, we adopt adversarial training to unify all the components by playing a Minimax game and ensure a coordinated effort to enhance recommendation performance. Extensive experiments on multiple open datasets demonstrate the superiority of our framework and the ablation study confirms the importance and effectiveness of each component.
△ Less
Submitted 23 October, 2020; v1 submitted 5 April, 2020;
originally announced April 2020.
-
A parsec-scale radio jet launched by the central intermediate-mass black hole in the dwarf galaxy SDSS J090613.77+561015.2?
Authors:
Jun Yang,
Leonid I. Gurvits,
Zsolt Paragi,
Sandor Frey,
John E. Conway,
Xiang Liu,
Lang Cui
Abstract:
The population of intermediate-mass black holes (IMBHs) in nearby dwarf galaxies plays an important "ground truth" role in exploring black hole formation and growth in the early Universe. In the dwarf elliptical galaxy SDSS J090613.77+561015.2 (z=0.0465), an accreting IMBH has been revealed by optical and X-ray observations. Aiming to search for possible radio core and jet associated with the IMBH…
▽ More
The population of intermediate-mass black holes (IMBHs) in nearby dwarf galaxies plays an important "ground truth" role in exploring black hole formation and growth in the early Universe. In the dwarf elliptical galaxy SDSS J090613.77+561015.2 (z=0.0465), an accreting IMBH has been revealed by optical and X-ray observations. Aiming to search for possible radio core and jet associated with the IMBH, we carried out very long baseline interferometry (VLBI) observations with the European VLBI Network (EVN) at 1.66 GHz. Our imaging results show that there are two 1-mJy components with a separation of about 52 mas (projected distance 47 pc) and the more compact component is located within the 1-sigma error circle of the optical centroid from available Gaia astrometry. Based on their positions, elongated structures and relatively high brightness temperatures, as well as the absence of star-forming activity in the host galaxy, we argue that the radio morphology originates from the jet activity powered by the central IMBH. The existence of the large-scale jet implies that violent jet activity might occur in the early epochs of black hole growth and thus help to regulate the co-evolution of black holes and galaxies.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
Attacks Which Do Not Kill Training Make Adversarial Learning Stronger
Authors:
Jingfeng Zhang,
Xilie Xu,
Bo Han,
Gang Niu,
Lizhen Cui,
Masashi Sugiyama,
Mohan Kankanhalli
Abstract:
Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question---do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ co…
▽ More
Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question---do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ confident adversarial data for updating the current model. We propose a novel approach of friendly adversarial training (FAT): rather than employing most adversarial data maximizing the loss, we search for least adversarial (i.e., friendly adversarial) data minimizing the loss, among the adversarial data that are confidently misclassified. Our novel formulation is easy to implement by just stopping the most adversarial data searching algorithms such as PGD (projected gradient descent) early, which we call early-stopped PGD. Theoretically, FAT is justified by an upper bound of the adversarial risk. Empirically, early-stopped PGD allows us to answer the earlier question negatively---adversarial robustness can indeed be achieved without compromising the natural generalization.
△ Less
Submitted 5 September, 2020; v1 submitted 25 February, 2020;
originally announced February 2020.
-
A Hierarchical Transitive-Aligned Graph Kernel for Un-attributed Graphs
Authors:
Lu Bai,
Lixin Cui,
Edwin R. Hancock
Abstract:
In this paper, we develop a new graph kernel, namely the Hierarchical Transitive-Aligned kernel, by transitively aligning the vertices between graphs through a family of hierarchical prototype graphs. Comparing to most existing state-of-the-art graph kernels, the proposed kernel has three theoretical advantages. First, it incorporates the locational correspondence information between graphs into t…
▽ More
In this paper, we develop a new graph kernel, namely the Hierarchical Transitive-Aligned kernel, by transitively aligning the vertices between graphs through a family of hierarchical prototype graphs. Comparing to most existing state-of-the-art graph kernels, the proposed kernel has three theoretical advantages. First, it incorporates the locational correspondence information between graphs into the kernel computation, and thus overcomes the shortcoming of ignoring structural correspondences arising in most R-convolution kernels. Second, it guarantees the transitivity between the correspondence information that is not available for most existing matching kernels. Third, it incorporates the information of all graphs under comparisons into the kernel computation process, and thus encapsulates richer characteristics. By transductively training the C-SVM classifier, experimental evaluations demonstrate the effectiveness of the new transitive-aligned kernel. The proposed kernel can outperform state-of-the-art graph kernels on standard graph-based datasets in terms of the classification accuracy.
△ Less
Submitted 8 February, 2020;
originally announced February 2020.
-
Multimodal Matching Transformer for Live Commenting
Authors:
Chaoqun Duan,
Lei Cui,
Shuming Ma,
Furu Wei,
Conghui Zhu,
Tiejun Zhao
Abstract:
Automatic live commenting aims to provide real-time comments on videos for viewers. It encourages users engagement on online video sites, and is also a good benchmark for video-to-text generation. Recent work on this task adopts encoder-decoder models to generate comments. However, these methods do not model the interaction between videos and comments explicitly, so they tend to generate popular c…
▽ More
Automatic live commenting aims to provide real-time comments on videos for viewers. It encourages users engagement on online video sites, and is also a good benchmark for video-to-text generation. Recent work on this task adopts encoder-decoder models to generate comments. However, these methods do not model the interaction between videos and comments explicitly, so they tend to generate popular comments that are often irrelevant to the videos. In this work, we aim to improve the relevance between live comments and videos by modeling the cross-modal interactions among different modalities. To this end, we propose a multimodal matching transformer to capture the relationships among comments, vision, and audio. The proposed model is based on the transformer framework and can iteratively learn the attention-aware representations for each modality. We evaluate the model on a publicly available live commenting dataset. Experiments show that the multimodal matching transformer model outperforms the state-of-the-art methods.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
Generation of pure-state single photons with high heralding efficiency by using a three-stage nonlinear interferometer
Authors:
Jiamin Li,
Jie Su,
Liang Cui,
Tianqi Xie,
Z. Y. Ou,
Xiaoying Li
Abstract:
We experimentally study a fiber-based three-stage nonlinear interferometer and demonstrate its application in generating heralded single photons with high efficiency and purity by spectral engineering. We obtain a heralding efficiency of 90% at a brightness of 0.039 photons/pulse. The purity of the source is checked by two-photon Hong-Ou-Mandel interference with a visibility of 95%+-6% (after corr…
▽ More
We experimentally study a fiber-based three-stage nonlinear interferometer and demonstrate its application in generating heralded single photons with high efficiency and purity by spectral engineering. We obtain a heralding efficiency of 90% at a brightness of 0.039 photons/pulse. The purity of the source is checked by two-photon Hong-Ou-Mandel interference with a visibility of 95%+-6% (after correcting Raman scattering and multi-pair events). Our investigation indicates that the heralded source of single photons produced by the three-stage nonlinear interferometer has the advantages of high purity, high heralding efficiency, high brightness, and flexibility in wavelength and bandwidth selection.
△ Less
Submitted 1 February, 2020;
originally announced February 2020.
-
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Authors:
Yiheng Xu,
Minghao Li,
Lei Cui,
Shaohan Huang,
Furu Wei,
Ming Zhou
Abstract:
Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the \textbf{LayoutLM} to jointly model interactions…
▽ More
Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at \url{https://aka.ms/layoutlm}.
△ Less
Submitted 16 June, 2020; v1 submitted 31 December, 2019;
originally announced December 2019.
-
Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification
Authors:
Renchun You,
Zhiyao Guo,
Lei Cui,
Xiang Long,
Yingze Bao,
Shilei Wen
Abstract:
Multi-label image and video classification are fundamental yet challenging tasks in computer vision. The main challenges lie in capturing spatial or temporal dependencies between labels and discovering the locations of discriminative features for each class. In order to overcome these challenges, we propose to use cross-modality attention with semantic graph embedding for multi label classificatio…
▽ More
Multi-label image and video classification are fundamental yet challenging tasks in computer vision. The main challenges lie in capturing spatial or temporal dependencies between labels and discovering the locations of discriminative features for each class. In order to overcome these challenges, we propose to use cross-modality attention with semantic graph embedding for multi label classification. Based on the constructed label graph, we propose an adjacency-based similarity graph embedding method to learn semantic label embeddings, which explicitly exploit label relationships. Then our novel cross-modality attention maps are generated with the guidance of learned label embeddings. Experiments on two multi-label image classification datasets (MS-COCO and NUS-WIDE) show our method outperforms other existing state-of-the-arts. In addition, we validate our method on a large multi-label video classification dataset (YouTube-8M Segments) and the evaluation results demonstrate the generalization capability of our method.
△ Less
Submitted 27 March, 2020; v1 submitted 17 December, 2019;
originally announced December 2019.
-
Electrically Driven Hot-Carrier Generation and Above-threshold Light Emission in Plasmonic Tunnel Junctions
Authors:
Longji Cui,
Yunxuan Zhu,
Mahdiyeh Abbasi,
Arash Ahmadivand,
Burak Gerislioglu,
Peter Nordlander,
Douglas Natelson
Abstract:
Above-threshold light emission from plasmonic tunnel junctions, when emitted photons have energies significantly higher than the energy scale of the incident electrons, has attracted much recent interest in nano-optics, while the underlying physical mechanism remains elusive. We examine above-threshold light emission in electromigrated tunnel junctions. Our measurements over a large ensemble of de…
▽ More
Above-threshold light emission from plasmonic tunnel junctions, when emitted photons have energies significantly higher than the energy scale of the incident electrons, has attracted much recent interest in nano-optics, while the underlying physical mechanism remains elusive. We examine above-threshold light emission in electromigrated tunnel junctions. Our measurements over a large ensemble of devices demonstrate a giant material dependence of photon yield (emitted photons per incident electrons), as large as four orders of magnitude. This dramatic effect cannot be explained only by the radiative field enhancement effect due to the localized plasmons in the tunneling gap. Emission is well described by a Boltzmann spectrum with an effective temperature exceeding 2000 K, coupled to a plasmon-modified photonic density of states. The effective temperature is approximately linear in the applied bias, consistent with a suggested theoretical model in which hot carriers are generated by non-radiative decay of electrically excited localized plasmons. Electrically driven hot-carrier generation and the associated non-traditional light emission could open new possibilities for active photochemistry, optoelectronics and quantum optics.
△ Less
Submitted 27 May, 2020; v1 submitted 11 December, 2019;
originally announced December 2019.
-
CONAN: Complementary Pattern Augmentation for Rare Disease Detection
Authors:
Limeng Cui,
Siddharth Biswal,
Lucas M. Glass,
Greg Lever,
Jimeng Sun,
Cao Xiao
Abstract:
Rare diseases affect hundreds of millions of people worldwide but are hard to detect since they have extremely low prevalence rates (varying from 1/1,000 to 1/200,000 patients) and are massively underdiagnosed. How do we reliably detect rare diseases with such low prevalence rates? How to further leverage patients with possibly uncertain diagnosis to improve detection? In this paper, we propose a…
▽ More
Rare diseases affect hundreds of millions of people worldwide but are hard to detect since they have extremely low prevalence rates (varying from 1/1,000 to 1/200,000 patients) and are massively underdiagnosed. How do we reliably detect rare diseases with such low prevalence rates? How to further leverage patients with possibly uncertain diagnosis to improve detection? In this paper, we propose a Complementary pattern Augmentation (CONAN) framework for rare disease detection. CONAN combines ideas from both adversarial training and max-margin classification. It first learns self-attentive and hierarchical embedding for patient pattern characterization. Then, we develop a complementary generative adversarial networks (GAN) model to generate candidate positive and negative samples from the uncertain patients by encouraging a max-margin between classes. In addition, CONAN has a disease detector that serves as the discriminator during the adversarial training for identifying rare diseases. We evaluated CONAN on two disease detection tasks. For low prevalence inflammatory bowel disease (IBD) detection, CONAN achieved .96 precision recall area under the curve (PR-AUC) and 50.1% relative improvement over best baseline. For rare disease idiopathic pulmonary fibrosis (IPF) detection, CONAN achieves .22 PR-AUC with 41.3% relative improvement over the best baseline.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Collaborative Attention Network for Person Re-identification
Authors:
Wenpeng Li,
Yongli Sun,
Jinjun Wang,
Han Xu,
Xiangru Yang,
Long Cui
Abstract:
Jointly utilizing global and local features to improve model accuracy is becoming a popular approach for the person re-identification (ReID) problem, because previous works using global features alone have very limited capacity at extracting discriminative local patterns in the obtained feature representation. Existing works that attempt to collect local patterns either explicitly slice the global…
▽ More
Jointly utilizing global and local features to improve model accuracy is becoming a popular approach for the person re-identification (ReID) problem, because previous works using global features alone have very limited capacity at extracting discriminative local patterns in the obtained feature representation. Existing works that attempt to collect local patterns either explicitly slice the global feature into several local pieces in a handcrafted way, or apply the attention mechanism to implicitly infer the importance of different local regions. In this paper, we show that by explicitly learning the importance of small local parts and part combinations, we can further improve the final feature representation for Re-ID. Specifically, we first separate the global feature into multiple local slices at different scale with a proposed multi-branch structure. Then we introduce the Collaborative Attention Network (CAN) to automatically learn the combination of features from adjacent slices. In this way, the combination keeps the intrinsic relation between adjacent features across local regions and scales, without losing information by partitioning the global features. Experiment results on several widely-used public datasets including Market-1501, DukeMTMC-ReID and CUHK03 prove that the proposed method outperforms many existing state-of-the-art methods.
△ Less
Submitted 8 September, 2020; v1 submitted 29 November, 2019;
originally announced November 2019.
-
Evaluating Commonsense in Pre-trained Language Models
Authors:
Xuhui Zhou,
Yue Zhang,
Leyang Cui,
Dandan Huang
Abstract:
Contextualized representations trained over large raw text data have given remarkable improvements for NLP tasks including question answering and reading comprehension. There have been works showing that syntactic, semantic and word sense knowledge are contained in such representations, which explains why they benefit such tasks. However, relatively little work has been done investigating commonse…
▽ More
Contextualized representations trained over large raw text data have given remarkable improvements for NLP tasks including question answering and reading comprehension. There have been works showing that syntactic, semantic and word sense knowledge are contained in such representations, which explains why they benefit such tasks. However, relatively little work has been done investigating commonsense knowledge contained in contextualized representations, which is crucial for human question answering and reading comprehension. We study the commonsense ability of GPT, BERT, XLNet, and RoBERTa by testing them on seven challenging benchmarks, finding that language modeling and its variants are effective objectives for promoting models' commonsense ability while bi-directional context and larger training set are bonuses. We additionally find that current models do poorly on tasks require more necessary inference steps. Finally, we test the robustness of models by making dual test cases, which are correlated so that the correct prediction of one sample should lead to correct prediction of the other. Interestingly, the models show confusion on these test cases, which suggests that they learn commonsense at the surface rather than the deep level. We release a test set, named CATs publicly, for future research.
△ Less
Submitted 11 February, 2021; v1 submitted 26 November, 2019;
originally announced November 2019.