Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–11 of 11 results for author: Men, R

.
  1. arXiv:2406.04594  [pdf, other

    cs.DC cs.AI cs.LG

    Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

    Authors: Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Huang Zhong, Dennis Cai, Yuan Xie, Binzhang Fu

    Abstract: The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2309.16609  [pdf, other

    cs.CL

    Qwen Technical Report

    Authors: Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan , et al. (23 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Q… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 59 pages, 5 figures

  3. arXiv:2212.04408  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

    Authors: Jinze Bai, Rui Men, Hao Yang, Xuancheng Ren, Kai Dang, Yichang Zhang, Xiaohuan Zhou, Peng Wang, Sinan Tan, An Yang, Zeyu Cui, Yu Han, Shuai Bai, Wenbin Ge, Jianxin Ma, Junyang Lin, Jingren Zhou, Chang Zhou

    Abstract: Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we rele… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  4. arXiv:2211.01335  [pdf, other

    cs.CV cs.CL

    Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

    Authors: An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou

    Abstract: The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining. In this work, we construct a large-scale dataset of image-text pairs in Chinese, where most data are retrieved from publicly available datasets, and we pretrain Chinese CLIP models on the new dataset. We develop 5 Chinese CLIP models of multiple s… ▽ More

    Submitted 22 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  5. arXiv:2202.03052  [pdf, other

    cs.CV cs.CL

    OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

    Authors: Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang

    Abstract: In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization. We propose OFA, a Task-Agnostic and Modality-Agnostic framework that supports Task Comprehensiveness. OFA unifies a diverse set of cross-modal and unimodal tasks, including image generation, visual grounding, image captioning, image classification, language… ▽ More

    Submitted 1 June, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: Accepted at ICML2022

  6. arXiv:2105.15082  [pdf, other

    cs.LG cs.CL

    M6-T: Exploring Sparse Expert Models and Beyond

    Authors: An Yang, Junyang Lin, Rui Men, Chang Zhou, Le Jiang, Xianyan Jia, Ang Wang, Jie Zhang, Jiamang Wang, Yong Li, Di Zhang, Wei Lin, Lin Qu, Jingren Zhou, Hongxia Yang

    Abstract: Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost, and thus it has become a trend in model scaling. Still it is a mystery how MoE layers bring quality gains by leveraging the parameters with sparse activation. In this work, we investigate several key factors in sparse expert models. We observe that load imbalance… ▽ More

    Submitted 9 August, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: 16 pages, 8 figures

  7. arXiv:2105.14211  [pdf, other

    cs.CV

    M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis via Non-Autoregressive Generative Transformers

    Authors: Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang

    Abstract: Conditional image synthesis aims to create an image according to some multi-modal guidance in the forms of textual descriptions, reference images, and image blocks to preserve, as well as their combinations. In this paper, instead of investigating these control signals separately, we propose a new two-stage architecture, M6-UFC, to unify any number of multi-modal controls. In M6-UFC, both the dive… ▽ More

    Submitted 19 February, 2022; v1 submitted 29 May, 2021; originally announced May 2021.

    Comments: Accepted by NeurIPS21

  8. arXiv:2105.13868  [pdf, other

    cs.CL cs.CV cs.IR

    Learning Relation Alignment for Calibrated Cross-modal Retrieval

    Authors: Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren Zhou, Xu Sun, Hongxia Yang

    Abstract: Despite the achievements of large-scale multimodal pre-training approaches, cross-modal retrieval, e.g., image-text retrieval, remains a challenging task. To bridge the semantic gap between the two modalities, previous studies mainly focus on word-region alignment at the object level, lacking the matching between the linguistic relation among the words and the visual relation among the regions. Th… ▽ More

    Submitted 1 June, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: Accepted by ACL-IJCNLP 2021 main conference (Long Paper)

  9. arXiv:2103.00823  [pdf, other

    cs.CL

    M6: A Chinese Multimodal Pretrainer

    Authors: Junyang Lin, Rui Men, An Yang, Chang Zhou, Ming Ding, Yichang Zhang, Peng Wang, Ang Wang, Le Jiang, Xianyan Jia, Jie Zhang, Jianwei Zhang, Xu Zou, Zhikang Li, Xiaodong Deng, Jie Liu, Jinbao Xue, Huiling Zhou, Jianxin Ma, Jin Yu, Yong Li, Wei Lin, Jingren Zhou, Jie Tang, Hongxia Yang

    Abstract: In this work, we construct the largest dataset for multimodal pretraining in Chinese, which consists of over 1.9TB images and 292GB texts that cover a wide range of domains. We propose a cross-modal pretraining method called M6, referring to Multi-Modality to Multi-Modality Multitask Mega-transformer, for unified pretraining on the data of single modality and multiple modalities. We scale the mode… ▽ More

    Submitted 29 May, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: 12 pages, technical report. Extension of paper "M6" accepted to KDD 2021

  10. arXiv:1512.08422  [pdf, other

    cs.CL cs.LG

    Natural Language Inference by Tree-Based Convolution and Heuristic Matching

    Authors: Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, Zhi Jin

    Abstract: In this paper, we propose the TBCNN-pair model to recognize entailment and contradiction between two sentences. In our model, a tree-based convolutional neural network (TBCNN) captures sentence-level semantics; then heuristic matching layers like concatenation, element-wise product/difference combine the information in individual sentences. Experimental results show that our model outperforms exis… ▽ More

    Submitted 13 May, 2016; v1 submitted 28 December, 2015; originally announced December 2015.

    Comments: Accepted by ACL'16 as a short paper

  11. arXiv:1510.07211  [pdf, other

    cs.SE cs.LG

    On End-to-End Program Generation from User Intention by Deep Neural Networks

    Authors: Lili Mou, Rui Men, Ge Li, Lu Zhang, Zhi Jin

    Abstract: This paper envisions an end-to-end program generation scenario using recurrent neural networks (RNNs): Users can express their intention in natural language; an RNN then automatically generates corresponding code in a characterby-by-character fashion. We demonstrate its feasibility through a case study and empirical analysis. To fully make such technique useful in practice, we also point out sever… ▽ More

    Submitted 25 October, 2015; originally announced October 2015.

    Comments: Submitted to 2016 International Conference of Software Engineering "Vision of 2025 and Beyond" track