Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–41 of 41 results for author: Ke, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14969  [pdf, other

    cs.LG cs.AI

    Uni-Mol2: Exploring Molecular Pretraining Model at Scale

    Authors: Xiaohong Ji, Zhen Wang, Zhifeng Gao, Hang Zheng, Linfeng Zhang, Guolin Ke, Weinan E

    Abstract: In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining mo… ▽ More

    Submitted 1 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2405.11769  [pdf, other

    q-bio.BM cs.LG physics.bio-ph

    Uni-Mol Docking V2: Towards Realistic and Accurate Binding Pose Prediction

    Authors: Eric Alcaide, Zhifeng Gao, Guolin Ke, Yaqi Li, Linfeng Zhang, Hang Zheng, Gengmo Zhou

    Abstract: In recent years, machine learning (ML) methods have emerged as promising alternatives for molecular docking, offering the potential for high accuracy without incurring prohibitive computational costs. However, recent studies have indicated that these ML models may overfit to quantitative metrics while neglecting the physical constraints inherent in the problem. In this work, we present Uni-Mol Doc… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  3. arXiv:2403.10897  [pdf, other

    cs.CV cs.MM

    Rethinking Multi-view Representation Learning via Distilled Disentangling

    Authors: Guanzhou Ke, Bo Wang, Xiaoli Wang, Shengfeng He

    Abstract: Multi-view representation learning aims to derive robust representations that are both view-consistent and view-specific from diverse data sources. This paper presents an in-depth analysis of existing approaches in this domain, highlighting a commonly overlooked aspect: the redundancy between view-consistent and view-specific representations. To this end, we propose an innovative framework for mul… ▽ More

    Submitted 29 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  4. arXiv:2403.10301  [pdf, other

    cs.CL cs.CV

    Uni-SMART: Universal Science Multimodal Analysis and Research Transformer

    Authors: Hengxing Cai, Xiaochen Cai, Shuwen Yang, Jiankun Wang, Lin Yao, Zhifeng Gao, Junhan Chang, Sihang Li, Mingjun Xu, Changxin Wang, Hongshuai Wang, Yongge Li, Mujie Lin, Yaqi Li, Yuqi Yin, Linfeng Zhang, Guolin Ke

    Abstract: In scientific research and its application, scientific literature analysis is crucial as it allows researchers to build on the work of others. However, the fast growth of scientific knowledge has led to a massive increase in scholarly articles, making in-depth literature analysis increasingly challenging and time-consuming. The emergence of Large Language Models (LLMs) has offered a new way to add… ▽ More

    Submitted 15 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  5. arXiv:2403.01976  [pdf, other

    cs.CL

    SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis

    Authors: Hengxing Cai, Xiaochen Cai, Junhan Chang, Sihang Li, Lin Yao, Changxin Wang, Zhifeng Gao, Hongshuai Wang, Yongge Li, Mujie Lin, Shuwen Yang, Jiankun Wang, Mingjun Xu, Jin Huang, Fang Xi, Jiaxi Zhuang, Yuqi Yin, Yaqi Li, Changhong Chen, Zheng Cheng, Zifeng Zhao, Linfeng Zhang, Guolin Ke

    Abstract: Recent breakthroughs in Large Language Models (LLMs) have revolutionized natural language understanding and generation, sparking significant interest in applying them to scientific literature analysis. However, existing benchmarks fail to adequately evaluate the proficiency of LLMs in this domain, particularly in scenarios requiring higher-level abilities beyond mere memorization and the handling… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  6. arXiv:2401.03862  [pdf, other

    physics.chem-ph cs.LG

    End-to-End Crystal Structure Prediction from Powder X-Ray Diffraction

    Authors: Qingsi Lai, Lin Yao, Zhifeng Gao, Siyuan Liu, Hongshuai Wang, Shuqi Lu, Di He, Liwei Wang, Cheng Wang, Guolin Ke

    Abstract: Crystal structure prediction (CSP) has made significant progress, but most methods focus on unconditional generations of inorganic crystal with limited atoms in the unit cell. This study introduces XtalNet, the first equivariant deep generative model for end-to-end CSP from Powder X-ray Diffraction (PXRD). Unlike previous methods that rely solely on composition, XtalNet leverages PXRD as an additi… ▽ More

    Submitted 1 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  7. arXiv:2309.15798  [pdf, other

    cs.LG physics.chem-ph q-bio.QM

    Node-Aligned Graph-to-Graph (NAG2G): Elevating Template-Free Deep Learning Approaches in Single-Step Retrosynthesis

    Authors: Lin Yao, Wentao Guo, Zhen Wang, Shang Xiang, Wentan Liu, Guolin Ke

    Abstract: Single-step retrosynthesis (SSR) in organic chemistry is increasingly benefiting from deep learning (DL) techniques in computer-aided synthesis design. While template-free DL models are flexible and promising for retrosynthesis prediction, they often ignore vital 2D molecular information and struggle with atom alignment for node generation, resulting in lower performance compared to the template-b… ▽ More

    Submitted 25 March, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Journal ref: JACS Au 4 (2024) 992-1003

  8. Disentangling Multi-view Representations Beyond Inductive Bias

    Authors: Guanzhou Ke, Yang Yu, Guoqing Chao, Xiaoli Wang, Chenyang Xu, Shengfeng He

    Abstract: Multi-view (or -modality) representation learning aims to understand the relationships between different view representations. Existing methods disentangle multi-view representations into consistent and view-specific representations by introducing strong inductive biases, which can limit their generalization ability. In this paper, we propose a novel multi-view representation disentangling method… ▽ More

    Submitted 4 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: 9 pages, 5 figures, 4 tables

    Journal ref: In Proceedings of the 31st ACM International Conference on Multimedia (MM '23), 2023

  9. arXiv:2304.12239  [pdf, other

    q-bio.BM cs.LG

    Uni-QSAR: an Auto-ML Tool for Molecular Property Prediction

    Authors: Zhifeng Gao, Xiaohong Ji, Guojiang Zhao, Hongshuai Wang, Hang Zheng, Guolin Ke, Linfeng Zhang

    Abstract: Recently deep learning based quantitative structure-activity relationship (QSAR) models has shown surpassing performance than traditional methods for property prediction tasks in drug discovery. However, most DL based QSAR models are restricted to limited labeled data to achieve better performance, and also are sensitive to model scale and hyper-parameters. In this paper, we propose Uni-QSAR, a po… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  10. arXiv:2303.16982  [pdf, other

    physics.chem-ph cs.LG

    Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+

    Authors: Shuqi Lu, Zhifeng Gao, Di He, Linfeng Zhang, Guolin Ke

    Abstract: Recent developments in deep learning have made remarkable progress in speeding up the prediction of quantum chemical (QC) properties by removing the need for expensive electronic structure calculations like density functional theory. However, previous methods learned from 1D SMILES sequences or 2D molecular graphs failed to achieve high accuracy as QC properties primarily depend on the 3D equilibr… ▽ More

    Submitted 7 July, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

  11. arXiv:2302.07134  [pdf, ps, other

    q-bio.BM cs.LG

    Do Deep Learning Models Really Outperform Traditional Approaches in Molecular Docking?

    Authors: Yuejiang Yu, Shuqi Lu, Zhifeng Gao, Hang Zheng, Guolin Ke

    Abstract: Molecular docking, given a ligand molecule and a ligand binding site (called ``pocket'') on a protein, predicting the binding mode of the protein-ligand complex, is a widely used technique in drug design. Many deep learning models have been developed for molecular docking, while most existing deep learning models perform docking on the whole protein, rather than on a given pocket as the traditiona… ▽ More

    Submitted 23 February, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

  12. arXiv:2302.07061  [pdf, other

    cs.CE cs.LG q-bio.BM

    Do Deep Learning Methods Really Perform Better in Molecular Conformation Generation?

    Authors: Gengmo Zhou, Zhifeng Gao, Zhewei Wei, Hang Zheng, Guolin Ke

    Abstract: Molecular conformation generation (MCG) is a fundamental and important problem in drug discovery. Many traditional methods have been developed to solve the MCG problem, such as systematic searching, model-building, random searching, distance geometry, molecular dynamics, Monte Carlo methods, etc. However, they have some limitations depending on the molecular structures. Recently, there are plenty… ▽ More

    Submitted 27 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

  13. arXiv:2302.06091  [pdf, other

    cs.CV cs.AI cs.LG q-bio.BM q-bio.QM

    Boosted ab initio Cryo-EM 3D Reconstruction with ACE-EM

    Authors: Lin Yao, Ruihan Xu, Zhifeng Gao, Guolin Ke, Yuhang Wang

    Abstract: The central problem in cryo-electron microscopy (cryo-EM) is to recover the 3D structure from noisy 2D projection images which requires estimating the missing projection angles (poses). Recent methods attempted to solve the 3D reconstruction problem with the autoencoder architecture, which suffers from the latent vector space sampling problem and frequently produces suboptimal pose inferences and… ▽ More

    Submitted 13 February, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

    ACM Class: I.4.5; I.5.1; I.5.2; I.5.4

  14. arXiv:2302.05847  [pdf, other

    q-bio.BM cs.LG

    3D Molecular Generation via Virtual Dynamics

    Authors: Shuqi Lu, Lin Yao, Xi Chen, Hang Zheng, Di He, Guolin Ke

    Abstract: Structure-based drug design, i.e., finding molecules with high affinities to the target protein pocket, is one of the most critical tasks in drug discovery. Traditional solutions, like virtual screening, require exhaustively searching on a large molecular database, which are inefficient and cannot return novel molecules beyond the database. The pocket-based 3D molecular generation model, i.e., dir… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

  15. A Clustering-guided Contrastive Fusion for Multi-view Representation Learning

    Authors: Guanzhou Ke, Guoqing Chao, Xiaoli Wang, Chenyang Xu, Yongqi Zhu, Yang Yu

    Abstract: The past two decades have seen increasingly rapid advances in the field of multi-view representation learning due to it extracting useful information from diverse domains to facilitate the development of multi-view applications. However, the community faces two challenges: i) how to learn robust representations from a large amount of unlabeled data to against noise or incomplete views setting, and… ▽ More

    Submitted 4 August, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: 13 pages, 9 figures

    Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, 2023

  16. arXiv:2210.05935  [pdf, other

    cs.LG

    Optimizing Evaluation Metrics for Multi-Task Learning via the Alternating Direction Method of Multipliers

    Authors: Ge-Yang Ke, Yan Pan, Jian Yin, Chang-Qin Huang

    Abstract: Multi-task learning (MTL) aims to improve the generalization performance of multiple tasks by exploiting the shared factors among them. Various metrics (e.g., F-score, Area Under the ROC Curve) are used to evaluate the performances of MTL methods. Most existing MTL methods try to minimize either the misclassified errors for classification or the mean squared errors for regression. In this paper, w… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  17. MORI-RAN: Multi-view Robust Representation Learning via Hybrid Contrastive Fusion

    Authors: Guanzhou Ke, Yongqi Zhu, Yang Yu

    Abstract: Multi-view representation learning is essential for many multi-view tasks, such as clustering and classification. However, there are two challenging problems plaguing the community: i)how to learn robust multi-view representation from mass unlabeled data and ii) how to balance the view consistency and the view specificity. To this end, in this paper, we proposed a hybrid contrastive fusion algorit… ▽ More

    Submitted 30 August, 2022; v1 submitted 26 August, 2022; originally announced August 2022.

    Comments: 8 pages, 3 figures

    Journal ref: ICDM 2022 workshop

  18. arXiv:2207.09682  [pdf, other

    cs.LG

    Quantized Training of Gradient Boosting Decision Trees

    Authors: Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, Tie-Yan Liu

    Abstract: Recent years have witnessed significant success in Gradient Boosting Decision Trees (GBDT) for a wide range of machine learning applications. Generally, a consensus about GBDT's training algorithms is gradients and statistics are computed based on high-precision floating points. In this paper, we investigate an essentially important question which has been largely ignored by the previous literatur… ▽ More

    Submitted 17 January, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

  19. arXiv:2204.06644  [pdf, other

    cs.LG cs.AI cs.CL

    METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

    Authors: Payal Bajaj, Chenyan Xiong, Guolin Ke, Xiaodong Liu, Di He, Saurabh Tiwary, Tie-Yan Liu, Paul Bennett, Xia Song, Jianfeng Gao

    Abstract: We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. Originated in ELECTRA, this training strategy has demonstrated sample-efficiency to pretrain models at the scale of hundreds of millions of parameters. In this work, we conduct a comprehensive empirical study, and propose a recipe, namely "Model generated d… ▽ More

    Submitted 16 April, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Update details in scaled initialization and add acknowledgement

  20. arXiv:2203.06123   

    physics.chem-ph cs.CE cs.LG

    An Empirical Study of Graphormer on Large-Scale Molecular Modeling Datasets

    Authors: Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu

    Abstract: This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation. The "Graphormer-V2" could attain better results on large-scale molecular modeling datasets than the vanilla one, and the performance gain could be consistently obtained on downstream tasks. In addition, we show that with a global recepti… ▽ More

    Submitted 14 March, 2022; v1 submitted 28 February, 2022; originally announced March 2022.

    Comments: Wrong dual-submission (arXiv:2203.04810) with negligently

  21. arXiv:2203.04810  [pdf, ps, other

    cs.LG

    Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets

    Authors: Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu

    Abstract: This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation. With these simple modifications, Graphormer could attain better results on large-scale molecular modeling datasets than the vanilla one, and the performance gain could be consistently obtained on 2D and 3D molecular graph modeling tasks.… ▽ More

    Submitted 7 January, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

  22. arXiv:2106.12566  [pdf, other

    cs.LG cs.CL stat.ML

    Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

    Authors: Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, Liwei Wang, Tie-Yan Liu

    Abstract: The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful atte… ▽ More

    Submitted 2 November, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021, camera ready version

  23. Deep Subdomain Adaptation Network for Image Classification

    Authors: Yongchun Zhu, Fuzhen Zhuang, Jindong Wang, Guolin Ke, Jingwu Chen, Jiang Bian, Hui Xiong, Qing He

    Abstract: For a target task where labeled data is unavailable, domain adaptation can transfer a learner from a different source domain. Previous deep domain adaptation methods mainly learn a global domain shift, i.e., align the global source and target distributions without considering the relationships between two subdomains within the same category of different domains, leading to unsatisfying transfer le… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: published on TNNLS

  24. arXiv:2106.08279  [pdf, ps, other

    cs.LG

    First Place Solution of KDD Cup 2021 & OGB Large-Scale Challenge Graph Prediction Track

    Authors: Chengxuan Ying, Mingqi Yang, Shuxin Zheng, Guolin Ke, Shengjie Luo, Tianle Cai, Chenglin Wu, Yuxin Wang, Yanming Shen, Di He

    Abstract: In this technical report, we present our solution of KDD Cup 2021 OGB Large-Scale Challenge - PCQM4M-LSC Track. We adopt Graphormer and ExpC as our basic models. We train each model by 8-fold cross-validation, and additionally train two Graphormer models on the union of training and validation sets with different random seeds. For final submission, we use a naive ensemble for these 18 models by ta… ▽ More

    Submitted 20 June, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

  25. arXiv:2106.05234  [pdf, other

    cs.LG cs.AI

    Do Transformers Really Perform Bad for Graph Representation?

    Authors: Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

    Abstract: The Transformer architecture has become a dominant choice in many domains, such as natural language processing and computer vision. Yet, it has not achieved competitive performance on popular leaderboards of graph-level prediction compared to mainstream GNN variants. Therefore, it remains a mystery how Transformers could perform well for graph representation learning. In this paper, we solve this… ▽ More

    Submitted 23 November, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Journal ref: NeurIPS 2021

  26. arXiv:2105.04297  [pdf, other

    cs.PL cs.LG cs.SE

    How could Neural Networks understand Programs?

    Authors: Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

    Abstract: Semantic understanding of programs is a fundamental problem for programming language processing (PLP). Recent works that learn representations of code based on pre-training techniques in NLP have pushed the frontiers in this direction. However, the semantics of PL and NL have essential differences. These being ignored, we believe it is difficult to build a model to better understand programs, by e… ▽ More

    Submitted 31 May, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Report number: PMLR 139:8476-8486, 2021

    Journal ref: ICML 2021

  27. arXiv:2103.00336  [pdf, other

    cs.LG cs.AI

    Transformers with Competitive Ensembles of Independent Mechanisms

    Authors: Alex Lamb, Di He, Anirudh Goyal, Guolin Ke, Chien-Feng Liao, Mirco Ravanelli, Yoshua Bengio

    Abstract: An important development in deep learning from the earliest MLPs has been a move towards architectures with structural inductive biases which enable the model to keep distinct sources of information and routes of processing well-separated. This structure is linked to the notion of independent mechanisms from the causality literature, in which a mechanism is able to retain the same processing as ir… ▽ More

    Submitted 27 February, 2021; originally announced March 2021.

    Comments: Under Review, ICML 2021

  28. arXiv:2102.12702  [pdf, other

    cs.CL cs.AI

    LazyFormer: Self Attention with Lazy Update

    Authors: Chengxuan Ying, Guolin Ke, Di He, Tie-Yan Liu

    Abstract: Improving the efficiency of Transformer-based language pre-training is an important task in NLP, especially for the self-attention module, which is computationally expensive. In this paper, we propose a simple but effective solution, called \emph{LazyFormer}, which computes the self-attention distribution infrequently. LazyFormer composes of multiple lazy blocks, each of which contains multiple Tr… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

  29. arXiv:2102.09206  [pdf, other

    cs.LG

    Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder

    Authors: Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tieyan Liu, Arnold Overwijk

    Abstract: Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space. Autoencoder-based language models are appealing in dense retrieval as they train the encoder to output high-quality embedding that can reconstruct the input texts. However, in this paper, we provide theoretical analyses and show empirically that an autoencoder language model with… ▽ More

    Submitted 16 September, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

  30. arXiv:2102.08357  [pdf, other

    cs.CL cs.LG

    Revisiting Language Encoding in Learning Multilingual Representations

    Authors: Shengjie Luo, Kaiyuan Gao, Shuxin Zheng, Guolin Ke, Di He, Liwei Wang, Tie-Yan Liu

    Abstract: Transformer has demonstrated its great power to learn contextual word representations for multiple languages in a single model. To process multilingual sentences in the model, a learnable vector is usually assigned to each language, which is called "language embedding". The language embedding can be either added to the word embedding or attached at the beginning of the sentence. It serves as a lan… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

  31. arXiv:2008.01466  [pdf, other

    cs.CL

    Taking Notes on the Fly Helps BERT Pre-training

    Authors: Qiyu Wu, Chen Xing, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

    Abstract: How to make unsupervised language pre-training more efficient and less resource-intensive is an important research direction in NLP. In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization. It is well-known that in language data corpus, words follow a heavy-tail distribution. A large proportion of words appear only very few tim… ▽ More

    Submitted 14 March, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: Qiyu Wu and Chen Xing contribute equally

  32. arXiv:2006.15595  [pdf, other

    cs.CL cs.LG

    Rethinking Positional Encoding in Language Pre-training

    Authors: Guolin Ke, Di He, Tie-Yan Liu

    Abstract: In this work, we investigate the positional encoding methods used in language pre-training (e.g., BERT) and identify several problems in the existing formulations. First, we show that in the absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations between the two heterogeneous information resources. It may bring unnecessary… ▽ More

    Submitted 15 March, 2021; v1 submitted 28 June, 2020; originally announced June 2020.

    Comments: update to ICLR's version

    Journal ref: International Conference on Learning Representations (ICLR) 2021, https://openreview.net/forum?id=09-528y2Fgf

  33. arXiv:2006.05744  [pdf, other

    cs.CL cs.LG

    MC-BERT: Efficient Language Pre-Training via a Meta Controller

    Authors: Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Liwei Wang, Jiang Bian, Tie-Yan Liu

    Abstract: Pre-trained contextual representations (e.g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks. However, large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. Our studies reveal that ELECTRA's success is mainl… ▽ More

    Submitted 16 June, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

  34. arXiv:2005.05650  [pdf, other

    eess.IV cs.CV cs.LG

    Invertible Image Rescaling

    Authors: Mingqing Xiao, Shuxin Zheng, Chang Liu, Yaolong Wang, Di He, Guolin Ke, Jiang Bian, Zhouchen Lin, Tie-Yan Liu

    Abstract: High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images. However, typical image downscaling is a non-injective mapping due to the loss of high-frequency information, which leads to the ill-posed problem of the inver… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

  35. arXiv:1908.09362  [pdf, other

    cs.LG cs.AI

    LightMC: A Dynamic and Efficient Multiclass Decomposition Algorithm

    Authors: Ziyu Liu, Guolin Ke, Jiang Bian, Tieyan Liu

    Abstract: Multiclass decomposition splits a multiclass classification problem into a series of independent binary learners and recomposes them by combining their outputs to reconstruct the multiclass classification results. Three widely-used realizations of such decomposition methods are One-Versus-All (OVA), One-Versus-One (OVO), and Error-Correcting-Output-Code (ECOC). While OVA and OVO are quite simple,… ▽ More

    Submitted 25 August, 2019; originally announced August 2019.

    Comments: 10 pages, 2 figures

  36. arXiv:1907.06870  [pdf, other

    cs.LG

    Light Multi-segment Activation for Model Compression

    Authors: Zhenhui Xu, Guolin Ke, Jia Zhang, Jiang Bian, Tie-Yan Liu

    Abstract: Model compression has become necessary when applying neural networks (NN) into many real application tasks that can accept slightly-reduced model accuracy with strict tolerance to model complexity. Recently, Knowledge Distillation, which distills the knowledge from well-trained and highly complex teacher model into a compact student model, has been widely used for model compression. However, under… ▽ More

    Submitted 25 November, 2019; v1 submitted 16 July, 2019; originally announced July 2019.

    Journal ref: Thirty-Fourth AAAI Conference on Artificial Intelligence. 2020

  37. arXiv:1810.04409  [pdf, other

    cs.CV

    Prediction of the Influence of Navigation Scan-path on Perceived Quality of Free-Viewpoint Videos

    Authors: Suiyi Ling, Jesús Gutiérrez, Gu Ke, Patrick Le Callet

    Abstract: Free-Viewpoint Video (FVV) systems allow the viewers to freely change the viewpoints of the scene. In such systems, view synthesis and compression are the two main sources of artifacts influencing the perceived quality. To assess this influence, quality evaluation studies are often carried out using conventional displays and generating predefined navigation trajectories mimicking the possible move… ▽ More

    Submitted 10 October, 2018; originally announced October 2018.

    Comments: 11 pages, 7 figures

  38. arXiv:1611.01276  [pdf, other

    cs.LG

    A Communication-Efficient Parallel Algorithm for Decision Tree

    Authors: Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu

    Abstract: Decision tree (and its extensions such as Gradient Boosting Decision Trees and Random Forest) is a widely used machine learning algorithm, due to its practical effectiveness and model interpretability. With the emergence of big data, there is an increasing need to parallelize the training process of decision tree. However, most existing attempts along this line suffer from high communication costs… ▽ More

    Submitted 4 November, 2016; originally announced November 2016.

  39. arXiv:1605.06170  [pdf, other

    cs.LG

    Evaluation System for a Bayesian Optimization Service

    Authors: Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke

    Abstract: Bayesian optimization is an elegant solution to the hyperparameter optimization problem in machine learning. Building a reliable and robust Bayesian optimization service requires careful testing methodology and sound statistical analysis. In this talk we will outline our development of an evaluation framework to rigorously test and measure the impact of changes to the SigOpt optimization service.… ▽ More

    Submitted 19 May, 2016; originally announced May 2016.

  40. arXiv:1603.09441  [pdf, other

    cs.LG stat.ML

    A Stratified Analysis of Bayesian Optimization Methods

    Authors: Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke

    Abstract: Empirical analysis serves as an important complement to theoretical analysis for studying practical Bayesian optimization. Often empirical insights expose strengths and weaknesses inaccessible to theoretical analysis. We define two metrics for comparing the performance of Bayesian optimization methods and propose a ranking mechanism for summarizing performance within various genres or strata of te… ▽ More

    Submitted 30 March, 2016; originally announced March 2016.

  41. arXiv:1502.07157  [pdf, ps, other

    cs.IR cs.CL

    Exploiting a comparability mapping to improve bi-lingual data categorization: a three-mode data analysis perspective

    Authors: Pierre-François Marteau, Guiyao Ke

    Abstract: We address in this paper the co-clustering and co-classification of bilingual data laying in two linguistic similarity spaces when a comparability measure defining a mapping between these two spaces is available. A new approach that we can characterized as a three-mode analysis scheme, is proposed to mix the comparability measure with the two similarity measures. Our aim is to improve jointly t… ▽ More

    Submitted 26 February, 2015; v1 submitted 25 February, 2015; originally announced February 2015.