Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 201–250 of 374 results for author: Cui, L

.
  1. arXiv:2106.05606  [pdf, other

    cs.CL

    VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization

    Authors: Tengchao Lv, Lei Cui, Momcilo Vasilijevic, Furu Wei

    Abstract: Video transcript summarization is a fundamental task for video understanding. Conventional approaches for transcript summarization are usually built upon the summarization data for written language such as news articles, while the domain discrepancy may degrade the model performance on spoken text. In this paper, we present VT-SSum, a benchmark dataset with spoken language for video transcript seg… ▽ More

    Submitted 15 July, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Work in progress

  2. arXiv:2106.01760  [pdf, other

    cs.CL

    Template-Based Named Entity Recognition Using BART

    Authors: Leyang Cui, Yu Wu, Jian Liu, Sen Yang, Yue Zhang

    Abstract: There is a recent interest in investigating few-shot NER, where the low-resource target domain has different label sets compared with a resource-rich source domain. Existing methods use a similarity-based metric. However, they cannot make full use of knowledge transfer in NER model parameters. To address the issue, we propose a template-based method for NER, treating NER as a language model rankin… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

  3. arXiv:2106.01263  [pdf, other

    cs.CL cs.AI

    Uni-Encoder: A Fast and Accurate Response Selection Paradigm for Generation-Based Dialogue Systems

    Authors: Chiyu Song, Hongliang He, Haofei Yu, Pengfei Fang, Leyang Cui, Zhenzhong Lan

    Abstract: Sample-and-rank is a key decoding strategy for modern generation-based dialogue systems. It helps achieve diverse and high-quality responses by selecting an answer from a small pool of generated candidates. The current state-of-the-art ranking methods mainly use an encoding paradigm called Cross-Encoder, which separately encodes each context-candidate pair and ranks the candidates according to the… ▽ More

    Submitted 15 May, 2023; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: Accepted to the Findings of ACL 2023

  4. arXiv:2106.00984  [pdf, other

    cs.CL cs.AI cs.LG

    Few-Shot Partial-Label Learning

    Authors: Yunfeng Zhao, Guoxian Yu, Lei Liu, Zhongmin Yan, Lizhen Cui, Carlotta Domeniconi

    Abstract: Partial-label learning (PLL) generally focuses on inducing a noise-tolerant multi-class classifier by training on overly-annotated samples, each of which is annotated with a set of labels, but only one is the valid label. A basic promise of existing PLL solutions is that there are sufficient partial-label (PL) samples for training. However, it is more common than not to have just few PL samples at… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: Accepted by International Joint Conference on Artificial Intelligence (IJCAI2021)

  5. arXiv:2105.14676  [pdf, other

    cs.LG

    NoiLIn: Improving Adversarial Training and Correcting Stereotype of Noisy Labels

    Authors: Jingfeng Zhang, Xilie Xu, Bo Han, Tongliang Liu, Gang Niu, Lizhen Cui, Masashi Sugiyama

    Abstract: Adversarial training (AT) formulated as the minimax optimization problem can effectively enhance the model's robustness against adversarial attacks. The existing AT methods mainly focused on manipulating the inner maximization for generating quality adversarial variants or manipulating the outer minimization for designing effective learning objectives. However, empirical results of AT always exhib… ▽ More

    Submitted 4 August, 2022; v1 submitted 30 May, 2021; originally announced May 2021.

    Comments: Accepted at Transactions on Machine Learning Research (TMLR) at June 2022

    Journal ref: Transactions on Machine Learning Research, 2022

  6. arXiv:2105.13695  [pdf, other

    cs.CV

    AutoSampling: Search for Effective Data Sampling Schedules

    Authors: Ming Sun, Haoxuan Dou, Baopu Li, Lei Cui, Junjie Yan, Wanli Ouyang

    Abstract: Data sampling acts as a pivotal role in training deep learning models. However, an effective sampling schedule is difficult to learn due to the inherently high dimension of parameters in learning the sampling schedule. In this paper, we propose an AutoSampling method to automatically learn sampling schedules for model training, which consists of the multi-exploitation step aiming for optimal local… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: Automl for sampling firstly without any assumpation

    Journal ref: ICML 2021

  7. Where are we in embedding spaces? A Comprehensive Analysis on Network Embedding Approaches for Recommender Systems

    Authors: Sixiao Zhang, Hongxu Chen, Xiao Ming, Lizhen Cui, Hongzhi Yin, Guandong Xu

    Abstract: Hyperbolic space and hyperbolic embeddings are becoming a popular research field for recommender systems. However, it is not clear under what circumstances the hyperbolic space should be considered. To fill this gap, This paper provides theoretical analysis and empirical results on when and where to use hyperbolic space and hyperbolic embeddings in recommender systems. Specifically, we answer the… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

  8. arXiv:2105.04153  [pdf, ps, other

    cs.LG

    Slashing Communication Traffic in Federated Learning by Transmitting Clustered Model Updates

    Authors: Laizhong Cui, Xiaoxin Su, Yipeng Zhou, Yi Pan

    Abstract: Federated Learning (FL) is an emerging decentralized learning framework through which multiple clients can collaboratively train a learning model. However, a major obstacle that impedes the wide deployment of FL lies in massive communication traffic. To train high dimensional machine learning models (such as CNN models), heavy communication traffic can be incurred by exchanging model updates via t… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: To appear in IEEE Journal on Selected Areas in Communications

  9. arXiv:2104.08836  [pdf, other

    cs.CL

    LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

    Authors: Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

    Abstract: Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually-rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities. In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich doc… ▽ More

    Submitted 9 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: Work in progress

  10. arXiv:2104.06601  [pdf, other

    cs.CV

    Zero-Shot Instance Segmentation

    Authors: Ye Zheng, Jiahong Wu, Yongqiang Qin, Faen Zhang, Li Cui

    Abstract: Deep learning has significantly improved the precision of instance segmentation with abundant labeled data. However, in many areas like medical and manufacturing, collecting sufficient data is extremely hard and labeling this data requires high professional skills. We follow this motivation and propose a new task set named zero-shot instance segmentation (ZSI). In the training phase of ZSI, the mo… ▽ More

    Submitted 31 May, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: 8 pages, 6 figures

    Journal ref: CVPR2021

  11. arXiv:2104.05525  [pdf, other

    astro-ph.GA astro-ph.IM

    East Asian VLBI Network Observations of Active Galactic Nuclei Jets: Imaging with KaVA+Tianma+Nanshan

    Authors: Yuzhu Cui, Kazuhiro Hada, Motoki Kino, Bong Won Sohn, Jongho Park, Hyun Wook Ro, Satoko Sawada-Satoh, Wu Jiang, Lang Cui, Mareki Honma, Zhi Qiang Shen, Fumie Tazaki, Tao An, Ilje Cho, Guang Yao Zhao, Xiao Peng Cheng, Kotaro Niinuma, Kiyoaki Wajima, Ying Kang Zhang, Noriyuki Kawaguchi, Juan Carlos Algaba, Shoko Koyama, Tomoya Hirota, Yoshinori Yonekura, Nobuyuki Sakai , et al. (52 additional authors not shown)

    Abstract: The East Asian very-long-baseline interferometry (VLBI) Network (EAVN) is a rapidly evolving international VLBI array that is currently promoted under joint efforts among China, Japan, and Korea. EAVN aims at forming a joint VLBI Network by combining a large number of radio telescopes distributed over East Asian regions. After the combination of the Korean VLBI Network (KVN) and the VLBI Explorati… ▽ More

    Submitted 14 April, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: 19 pages, 9 figures, accepted by Research in Astronomy and Astrophysics (RAA)

  12. arXiv:2104.04350  [pdf, ps, other

    math.OA math.RA

    On cleanness of von Neumann algebras

    Authors: Lu Cui, Linzhe Huang, Wenming Wu, Wei Yuan, Hanbin Zhang

    Abstract: A unital ring is called clean (resp. strongly clean) if every element can be written as the sum of an invertible element and an idempotent (resp. an invertible element and an idempotent that commutes). T.Y. Lam proposed a question: which von Neumann algebras are clean as rings? In this paper, we characterize strongly clean von Neumann algebras and prove that all finite von Neumann algebras and all… ▽ More

    Submitted 12 January, 2022; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: 19 pages(A citation mistake is corrected)

    MSC Class: 47C15(Primary) 16U99; 46L10; 47A65(Secondary)

  13. arXiv:2103.16047  [pdf, other

    cs.CV

    Noise-resistant Deep Metric Learning with Ranking-based Instance Selection

    Authors: Chang Liu, Han Yu, Boyang Li, Zhiqi Shen, Zhanning Gao, Peiran Ren, Xuansong Xie, Lizhen Cui, Chunyan Miao

    Abstract: The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving robustness to noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains open. In this paper, we propose a noise-resistant training technique for DML, which we name Probabilistic Ranking… ▽ More

    Submitted 12 April, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: Accepted by CVPR 2021

  14. arXiv:2103.14846  [pdf, other

    cs.CV cs.RO

    AR Mapping: Accurate and Efficient Mapping for Augmented Reality

    Authors: Rui Huang, Chuan Fang, Kejie Qiu, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

    Abstract: Augmented reality (AR) has gained increasingly attention from both research and industry communities. By overlaying digital information and content onto the physical world, AR enables users to experience the world in a more informative and efficient manner. As a major building block for AR systems, localization aims at determining the device's pose from a pre-built "map" consisting of visual and d… ▽ More

    Submitted 27 March, 2021; originally announced March 2021.

    Comments: 8 pages, 14 figures

  15. arXiv:2103.14826  [pdf, other

    cs.RO

    Compact 3D Map-Based Monocular Localization Using Semantic Edge Alignment

    Authors: Kejie Qiu, Shenzhou Chen, Jiahui Zhang, Rui Huang, Le Cui, Siyu Zhu, Ping Tan

    Abstract: Accurate localization is fundamental to a variety of applications, such as navigation, robotics, autonomous driving, and Augmented Reality (AR). Different from incremental localization, global localization has no drift caused by error accumulation, which is desired in many application scenarios. In addition to GPS used in the open air, 3D maps are also widely used as alternative global localizatio… ▽ More

    Submitted 27 March, 2021; originally announced March 2021.

  16. Towards Personalized Federated Learning

    Authors: Alysa Ziying Tan, Han Yu, Lizhen Cui, Qiang Yang

    Abstract: In parallel with the rapid adoption of Artificial Intelligence (AI) empowered by advances in AI research, there have been growing awareness and concerns of data privacy. Recent significant developments in the data regulation landscape have prompted a seismic shift in interest towards privacy-preserving AI. This has contributed to the popularity of Federated Learning (FL), the leading paradigm for… ▽ More

    Submitted 17 March, 2022; v1 submitted 28 February, 2021; originally announced March 2021.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems

  17. arXiv:2101.12549  [pdf, other

    cs.IR

    Graph Embedding for Recommendation against Attribute Inference Attacks

    Authors: Shijie Zhang, Hongzhi Yin, Tong Chen, Zi Huang, Lizhen Cui, Xiangliang Zhang

    Abstract: In recent years, recommender systems play a pivotal role in helping users identify the most suitable items that satisfy personal preferences. As user-item interactions can be naturally modelled as graph-structured data, variants of graph convolutional networks (GCNs) have become a well-established building block in the latest recommenders. Due to the wide utilization of sensitive user profile data… ▽ More

    Submitted 29 January, 2021; originally announced January 2021.

  18. FENet: A Frequency Extraction Network for Obstructive Sleep Apnea Detection

    Authors: Guanhua Ye, Hongzhi Yin, Tong Chen, Hongxu Chen, Lizhen Cui, Xiangliang Zhang

    Abstract: Obstructive Sleep Apnea (OSA) is a highly prevalent but inconspicuous disease that seriously jeopardizes the health of human beings. Polysomnography (PSG), the gold standard of detecting OSA, requires multiple specialized sensors for signal collection, hence patients have to physically visit hospitals and bear the costly treatment for a single detection. Recently, many single-sensor alternatives h… ▽ More

    Submitted 8 January, 2021; originally announced January 2021.

    Comments: To appear in JBHI

  19. arXiv:2101.00372  [pdf, ps, other

    astro-ph.HE astro-ph.GA

    Analysing the radio flux density profile of the M31 galaxy: a possible dark matter interpretation

    Authors: Man Ho Chan, Chu Fai Yeung, Lang Cui, Chun Sing Leung

    Abstract: Some recent studies have examined the gamma-ray flux profile of our Galaxy to determine the signal of dark matter annihilation. However, the results are controversial and no confirmation is obtained. In this article, we study the radio flux density profile of the M31 galaxy and show that it could manifest a possible signal of dark matter annihilation. By comparing the likelihoods between the archi… ▽ More

    Submitted 2 January, 2021; originally announced January 2021.

    Comments: Accepted in MNRAS

    Journal ref: MNRAS 501, 5692 (2021)

  20. arXiv:2012.14740  [pdf, other

    cs.CL

    LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

    Authors: Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou

    Abstract: Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents. We propose LayoutLMv2 architecture with new pre-training tasks to model the interaction among text, layout, and image in a single multi-modal framework. Specifically, with a… ▽ More

    Submitted 9 January, 2022; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: ACL 2021 main conference

  21. arXiv:2012.06852  [pdf, other

    cs.IR

    Self-Supervised Hypergraph Convolutional Networks for Session-based Recommendation

    Authors: Xin Xia, Hongzhi Yin, Junliang Yu, Qinyong Wang, Lizhen Cui, Xiangliang Zhang

    Abstract: Session-based recommendation (SBR) focuses on next-item prediction at a certain time point. As user profiles are generally not available in this scenario, capturing the user intent lying in the item transitions plays a pivotal role. Recent graph neural networks (GNNs) based SBR methods regard the item transitions as pairwise relations, which neglect the complex high-order information among items.… ▽ More

    Submitted 28 February, 2022; v1 submitted 12 December, 2020; originally announced December 2020.

    Comments: 9 pages, 4 figures, accepted by AAAI'21. Correct some typos in the previous version

  22. arXiv:2011.04864  [pdf, other

    cs.CL

    Natural Language Inference in Context -- Investigating Contextual Reasoning over Long Texts

    Authors: Hanmeng Liu, Leyang Cui, Jian Liu, Yue Zhang

    Abstract: Natural language inference (NLI) is a fundamental NLP task, investigating the entailment relationship between two texts. Popular NLI datasets present the task at sentence-level. While adequate for testing semantic representations, they fall short for testing contextual reasoning over long texts, which is a natural part of the human inference process. We introduce ConTRoL, a new dataset for ConText… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

  23. arXiv:2010.13849  [pdf

    q-bio.TO

    Numerical analysis of the strain distribution in skin domes formed upon the application of hypobaric pressure

    Authors: Daniel Sebastia-Saez, Faiza Benaouda, Charlie Lim, Guoping Lian, Stuart Jones, Tao Chen, Liang Cui

    Abstract: Suction cups are widely used in applications such as in measurement of mechanical properties of skin in vivo, in drug delivery devices or in acupuncture treatment. Understanding the mechanical response of skin under hypobaric pressure are of great importance for users of suction cups. The aims of this work are to assess the capability of linear elasticity (Young's modulus) or hyperelasticity in pr… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

  24. arXiv:2010.13049  [pdf

    cs.CL

    Commonsense knowledge adversarial dataset that challenges ELECTRA

    Authors: Gongqi Lin, Yuan Miao, Xiaoyong Yang, Wenwu Ou, Lizhen Cui, Wei Guo, Chunyan Miao

    Abstract: Commonsense knowledge is critical in human reading comprehension. While machine comprehension has made significant progress in recent years, the ability in handling commonsense knowledge remains limited. Synonyms are one of the most widely used commonsense knowledge. Constructing adversarial dataset is an important approach to find weak points of machine comprehension models and support the design… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

    Comments: To appear in ICARCV2020

  25. Resolving the inner jet of PKS 1749+096 with super-resolution VLBA images at 7 mm

    Authors: Lang Cui, Ru-Sen Lu, Wei Yu, Jun Liu, Víctor Patiño-Álvarez, Qi Yuan

    Abstract: High resolution imaging of inner jets in Active Galactic Nuclei (AGNs) with VLBI at millimeter wavelengths provides deep insight into the launching and collimation mechanisms of relativistic jets. The BL Lac object, PKS 1749+096, shows a core-dominated jet pointing toward the northeast on parsec-scales revealed by various VLBI observations. In order to investigate the jet kinematics, in particular… ▽ More

    Submitted 26 October, 2020; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: 7 pages, 3 figures, accepted for publication in Research in Astronomy and Astrophysics (RAA)

  26. arXiv:2010.07711  [pdf, other

    cs.CL

    Does Chinese BERT Encode Word Structure?

    Authors: Yile Wang, Leyang Cui, Yue Zhang

    Abstract: Contextualized representations give significantly improved results for a wide range of NLP tasks. Much work has been dedicated to analyzing the features captured by representative models such as BERT. Existing work finds that syntactic, semantic and word sense knowledge are encoded in BERT. However, little work has investigated word features for character-based languages such as Chinese. We invest… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

    Comments: Accepted by COLING2020

  27. arXiv:2010.06310  [pdf, other

    cs.CL

    Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks

    Authors: Yue Wang, Zhuo Xu, Lu Bai, Yao Wan, Lixin Cui, Qian Zhao, Edwin R. Hancock, Philip S. Yu

    Abstract: Joint-event-extraction, which extracts structural information (i.e., entities or triggers of events) from unstructured real-world corpora, has attracted more and more research attention in natural language processing. Most existing works do not fully address the sparse co-occurrence relationships between entities and triggers, which loses this important information and thus deteriorates the extrac… ▽ More

    Submitted 13 October, 2020; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: Accepted by ICPR 2020

  28. arXiv:2010.04529  [pdf, other

    cs.CL

    What Have We Achieved on Text Summarization?

    Authors: Dandan Huang, Leyang Cui, Sen Yang, Guangsheng Bao, Kun Wang, Jun Xie, Yue Zhang

    Abstract: Deep learning has led to significant improvement in text summarization with various methods investigated and improved ROUGE scores reported over the years. However, gaps still exist between summaries produced by automatic summarizers and human professionals. Aiming to gain more understanding of summarization systems with respect to their strengths and limits on a fine-grained syntactic and semanti… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted by EMNLP 2020

  29. Background Learnable Cascade for Zero-Shot Object Detection

    Authors: Ye Zheng, Ruoran Huang, Chuanqi Han, Xi Huang, Li Cui

    Abstract: Zero-shot detection (ZSD) is crucial to large-scale object detection with the aim of simultaneously localizing and recognizing unseen objects. There remain several challenges for ZSD, including reducing the ambiguity between background and unseen objects as well as improving the alignment between visual and semantic concept. In this work, we propose a novel framework named Background Learnable Cas… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: 18 pages, 5figures

  30. A simple and efficient kinetic model for wealth distribution with saving propensity effect: based on lattice gas automaton

    Authors: Lijie Cui, Chuandong Lin

    Abstract: The dynamics of wealth distribution plays a critical role in the economic market, hence an understanding of its nonequilibrium statistical mechanics is of great importance to human society. For this aim, a simple and efficient one-dimensional (1D) lattice gas automaton (LGA) is presented for wealth distribution of agents with or without saving propensity. The LGA comprises two stages, i.e., random… ▽ More

    Submitted 11 September, 2020; v1 submitted 12 August, 2020; originally announced August 2020.

  31. arXiv:2008.03963  [pdf

    quant-ph

    Engineering the spectral profile of photon pairs by using multi-stage nonlinear interferometers

    Authors: Mingyi Ma, Liang Cui, Xiaoying Li

    Abstract: Using the quantum interference of photon pairs in N-stage nonlinear interferometers (NLI), the contour of joint spectral function can be modified into islands pattern. We perform two series of experiments. One is that all the nonlinear fibers in pulse pumped NLI are identical; the other is that the lengths of N pieces nonlinear fibers are different. We not only demonstrate how the pattern of spect… ▽ More

    Submitted 14 October, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

  32. arXiv:2008.03945  [pdf, other

    cs.CL

    On Commonsense Cues in BERT for Solving Commonsense Tasks

    Authors: Leyang Cui, Sijie Cheng, Yu Wu, Yue Zhang

    Abstract: BERT has been used for solving commonsense tasks such as CommonsenseQA. While prior research has found that BERT does contain commonsense information to some extent, there has been work showing that pre-trained models can rely on spurious associations (e.g., data bias) rather than key cues in solving sentiment classification and other problems. We quantitatively investigate the presence of structu… ▽ More

    Submitted 15 June, 2021; v1 submitted 10 August, 2020; originally announced August 2020.

  33. arXiv:2007.08124  [pdf, other

    cs.CL

    LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning

    Authors: Jian Liu, Leyang Cui, Hanmeng Liu, Dandan Huang, Yile Wang, Yue Zhang

    Abstract: Machine reading is a fundamental task for testing the capability of natural language understanding, which is closely related to human cognition in many aspects. With the rising of deep learning techniques, algorithmic models rival human performances on simple QA, and thus increasingly challenging machine reading datasets have been proposed. Though various challenges such as evidence integration an… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: Accepted by IJCAI2020

  34. arXiv:2006.01038  [pdf, other

    cs.CL

    DocBank: A Benchmark Dataset for Document Layout Analysis

    Authors: Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, Ming Zhou

    Abstract: Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are still insufficient. In this paper, we present \textbf{DocBank}, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for… ▽ More

    Submitted 11 November, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: COLING 2020

  35. arXiv:2006.00885  [pdf, other

    cs.SI cs.CL

    CoAID: COVID-19 Healthcare Misinformation Dataset

    Authors: Limeng Cui, Dongwon Lee

    Abstract: As the COVID-19 virus quickly spreads around the world, unfortunately, misinformation related to COVID-19 also gets created and spreads like wild fire. Such misinformation has caused confusion among people, disruptions in society, and even deadly consequences in health problems. To be able to understand, detect, and mitigate such COVID-19 misinformation, therefore, has not only deep intellectual v… ▽ More

    Submitted 3 November, 2020; v1 submitted 22 May, 2020; originally announced June 2020.

  36. arXiv:2005.10150  [pdf, other

    cs.SI cs.IR

    GCN-Based User Representation Learning for Unifying Robust Recommendation and Fraudster Detection

    Authors: Shijie Zhang, Hongzhi Yin, Tong Chen, Quoc Viet Nguyen Hung, Zi Huang, Lizhen Cui

    Abstract: In recent years, recommender system has become an indispensable function in all e-commerce platforms. The review rating data for a recommender system typically comes from open platforms, which may attract a group of malicious users to deliberately insert fake feedback in an attempt to bias the recommender system to their favour. The presence of such attacks may violate modeling assumptions that hi… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

  37. A practical Response Adaptive Block Randomization (RABR) design with analytic type I error protection

    Authors: Tianyu Zhan, Lu Cui, Ziqian Geng, Lanju Zhang, Yihua Gu, Ivan S. F. Chan

    Abstract: Response adaptive randomization (RAR) is appealing from methodological, ethical, and pragmatic perspectives in the sense that subjects are more likely to be randomized to better performing treatment groups based on accumulating data. However, applications of RAR in confirmatory drug clinical trials with multiple active arms are limited largely due to its complexity, and lack of control of randomiz… ▽ More

    Submitted 1 August, 2022; v1 submitted 15 April, 2020; originally announced April 2020.

  38. arXiv:2004.04494  [pdf, other

    cs.CL

    MuTual: A Dataset for Multi-Turn Dialogue Reasoning

    Authors: Leyang Cui, Yu Wu, Shujie Liu, Yue Zhang, Ming Zhou

    Abstract: Non-task oriented dialogue systems have achieved great success in recent years due to largely accessible conversation data and the development of deep learning techniques. Given a context, current systems are able to yield a relevant and fluent response, but sometimes make logical mistakes because of weak reasoning capabilities. To facilitate the conversation reasoning research, we introduce MuTua… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

    Comments: ACL 2020

  39. arXiv:2004.02340  [pdf, other

    cs.IR cs.SI

    Enhancing Social Recommendation with Adversarial Graph Convolutional Networks

    Authors: Junliang Yu, Hongzhi Yin, Jundong Li, Min Gao, Zi Huang, Lizhen Cui

    Abstract: Social recommender systems are expected to improve recommendation quality by incorporating social information when there is little user-item interaction data. However, recent reports from industry show that social recommender systems consistently fail in practice. According to the negative findings, the failure is attributed to: (1) A majority of users only have a very limited number of neighbors… ▽ More

    Submitted 23 October, 2020; v1 submitted 5 April, 2020; originally announced April 2020.

    Comments: Accepted by TKDE

  40. arXiv:2003.11412  [pdf, ps, other

    astro-ph.HE astro-ph.GA

    A parsec-scale radio jet launched by the central intermediate-mass black hole in the dwarf galaxy SDSS J090613.77+561015.2?

    Authors: Jun Yang, Leonid I. Gurvits, Zsolt Paragi, Sandor Frey, John E. Conway, Xiang Liu, Lang Cui

    Abstract: The population of intermediate-mass black holes (IMBHs) in nearby dwarf galaxies plays an important "ground truth" role in exploring black hole formation and growth in the early Universe. In the dwarf elliptical galaxy SDSS J090613.77+561015.2 (z=0.0465), an accreting IMBH has been revealed by optical and X-ray observations. Aiming to search for possible radio core and jet associated with the IMBH… ▽ More

    Submitted 25 March, 2020; originally announced March 2020.

    Comments: Accepted for publication as a Letter in MNRAS

    Journal ref: Monthly Notices of the Royal Astronomical Society Letters, Vol. 495, pp. L71-L75 (2020)

  41. arXiv:2002.11242  [pdf, other

    cs.LG stat.ML

    Attacks Which Do Not Kill Training Make Adversarial Learning Stronger

    Authors: Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, Mohan Kankanhalli

    Abstract: Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question---do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ co… ▽ More

    Submitted 5 September, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: Thirty-seventh International Conference on Machine Learning (ICML 2020)

  42. arXiv:2002.04425  [pdf, other

    cs.SI cs.LG

    A Hierarchical Transitive-Aligned Graph Kernel for Un-attributed Graphs

    Authors: Lu Bai, Lixin Cui, Edwin R. Hancock

    Abstract: In this paper, we develop a new graph kernel, namely the Hierarchical Transitive-Aligned kernel, by transitively aligning the vertices between graphs through a family of hierarchical prototype graphs. Comparing to most existing state-of-the-art graph kernels, the proposed kernel has three theoretical advantages. First, it incorporates the locational correspondence information between graphs into t… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  43. arXiv:2002.02649  [pdf, other

    cs.CL

    Multimodal Matching Transformer for Live Commenting

    Authors: Chaoqun Duan, Lei Cui, Shuming Ma, Furu Wei, Conghui Zhu, Tiejun Zhao

    Abstract: Automatic live commenting aims to provide real-time comments on videos for viewers. It encourages users engagement on online video sites, and is also a good benchmark for video-to-text generation. Recent work on this task adopts encoder-decoder models to generate comments. However, these methods do not model the interaction between videos and comments explicitly, so they tend to generate popular c… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

  44. Generation of pure-state single photons with high heralding efficiency by using a three-stage nonlinear interferometer

    Authors: Jiamin Li, Jie Su, Liang Cui, Tianqi Xie, Z. Y. Ou, Xiaoying Li

    Abstract: We experimentally study a fiber-based three-stage nonlinear interferometer and demonstrate its application in generating heralded single photons with high efficiency and purity by spectral engineering. We obtain a heralding efficiency of 90% at a brightness of 0.039 photons/pulse. The purity of the source is checked by two-photon Hong-Ou-Mandel interference with a visibility of 95%+-6% (after corr… ▽ More

    Submitted 1 February, 2020; originally announced February 2020.

    Comments: 5 pages, 4 figures

  45. LayoutLM: Pre-training of Text and Layout for Document Image Understanding

    Authors: Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou

    Abstract: Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the \textbf{LayoutLM} to jointly model interactions… ▽ More

    Submitted 16 June, 2020; v1 submitted 31 December, 2019; originally announced December 2019.

    Comments: KDD 2020

  46. arXiv:1912.07872  [pdf, other

    cs.CV

    Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification

    Authors: Renchun You, Zhiyao Guo, Lei Cui, Xiang Long, Yingze Bao, Shilei Wen

    Abstract: Multi-label image and video classification are fundamental yet challenging tasks in computer vision. The main challenges lie in capturing spatial or temporal dependencies between labels and discovering the locations of discriminative features for each class. In order to overcome these challenges, we propose to use cross-modality attention with semantic graph embedding for multi label classificatio… ▽ More

    Submitted 27 March, 2020; v1 submitted 17 December, 2019; originally announced December 2019.

    Comments: Accepted by AAAI2020

  47. Electrically Driven Hot-Carrier Generation and Above-threshold Light Emission in Plasmonic Tunnel Junctions

    Authors: Longji Cui, Yunxuan Zhu, Mahdiyeh Abbasi, Arash Ahmadivand, Burak Gerislioglu, Peter Nordlander, Douglas Natelson

    Abstract: Above-threshold light emission from plasmonic tunnel junctions, when emitted photons have energies significantly higher than the energy scale of the incident electrons, has attracted much recent interest in nano-optics, while the underlying physical mechanism remains elusive. We examine above-threshold light emission in electromigrated tunnel junctions. Our measurements over a large ensemble of de… ▽ More

    Submitted 27 May, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: 23 pages, 5 figures + supplementary information

    Journal ref: Nano Lett. 20, 6067-6075 (2020)

  48. arXiv:1911.13232  [pdf, other

    cs.LG cs.CL

    CONAN: Complementary Pattern Augmentation for Rare Disease Detection

    Authors: Limeng Cui, Siddharth Biswal, Lucas M. Glass, Greg Lever, Jimeng Sun, Cao Xiao

    Abstract: Rare diseases affect hundreds of millions of people worldwide but are hard to detect since they have extremely low prevalence rates (varying from 1/1,000 to 1/200,000 patients) and are massively underdiagnosed. How do we reliably detect rare diseases with such low prevalence rates? How to further leverage patients with possibly uncertain diagnosis to improve detection? In this paper, we propose a… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  49. arXiv:1911.13008  [pdf, other

    cs.CV

    Collaborative Attention Network for Person Re-identification

    Authors: Wenpeng Li, Yongli Sun, Jinjun Wang, Han Xu, Xiangru Yang, Long Cui

    Abstract: Jointly utilizing global and local features to improve model accuracy is becoming a popular approach for the person re-identification (ReID) problem, because previous works using global features alone have very limited capacity at extracting discriminative local patterns in the obtained feature representation. Existing works that attempt to collect local patterns either explicitly slice the global… ▽ More

    Submitted 8 September, 2020; v1 submitted 29 November, 2019; originally announced November 2019.

  50. arXiv:1911.11931  [pdf, other

    cs.CL cs.AI

    Evaluating Commonsense in Pre-trained Language Models

    Authors: Xuhui Zhou, Yue Zhang, Leyang Cui, Dandan Huang

    Abstract: Contextualized representations trained over large raw text data have given remarkable improvements for NLP tasks including question answering and reading comprehension. There have been works showing that syntactic, semantic and word sense knowledge are contained in such representations, which explains why they benefit such tasks. However, relatively little work has been done investigating commonse… ▽ More

    Submitted 11 February, 2021; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: AAAI 2020