Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 168 results for author: Geng, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14532  [pdf, other

    cs.LG cs.CL

    RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

    Authors: Amrith Setlur, Saurabh Garg, Xinyang Geng, Naman Garg, Virginia Smith, Aviral Kumar

    Abstract: Training on model-generated synthetic data is a promising approach for finetuning LLMs, but it remains unclear when it helps or hurts. In this paper, we investigate this question for math reasoning via an empirical study, followed by building a conceptual understanding of our observations. First, we find that while the typical approach of finetuning a model on synthetic correct or positive problem… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.13185  [pdf, other

    cs.CL

    Learnable In-Context Vector for Visual Question Answering

    Authors: Yingzhe Peng, Chenduo Hao, Xu Yang, Jiawei Peng, Xinting Hu, Xin Geng

    Abstract: As language models continue to scale, Large Language Models (LLMs) have exhibited emerging capabilities in In-Context Learning (ICL), enabling them to solve language tasks by prefixing a few in-context demonstrations (ICDs) as context. Inspired by these advancements, researchers have extended these techniques to develop Large Multimodal Models (LMMs) with ICL capabilities. However, applying ICL us… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.12199  [pdf, other

    cs.LG cs.AI

    Time Series Modeling for Heart Rate Prediction: From ARIMA to Transformers

    Authors: Haowei Ni, Shuchen Meng, Xieming Geng, Panfeng Li, Zhuoying Li, Xupeng Chen, Xiaotong Wang, Shiyao Zhang

    Abstract: Cardiovascular disease (CVD) is a leading cause of death globally, necessitating precise forecasting models for monitoring vital signs like heart rate, blood pressure, and ECG. Traditional models, such as ARIMA and Prophet, are limited by their need for manual parameter tuning and challenges in handling noisy, sparse, and highly variable medical data. This study investigates advanced deep learning… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.09397  [pdf, other

    cs.CV cs.AI

    Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

    Authors: Miaosen Zhang, Yixuan Wei, Zhen Xing, Yifei Ma, Zuxuan Wu, Ji Li, Zheng Zhang, Qi Dai, Chong Luo, Xin Geng, Baining Guo

    Abstract: Modern vision models are trained on very large noisy datasets. While these models acquire strong capabilities, they may not follow the user's intent to output the desired results in certain aspects, e.g., visual aesthetic, preferred style, and responsibility. In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 28 pages, 26 figures, under review

  5. arXiv:2406.07871  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Flexible Music-Conditioned Dance Generation with Style Description Prompts

    Authors: Hongsong Wang, Yin Zhu, Xin Geng

    Abstract: Dance plays an important role as an artistic form and expression in human culture, yet the creation of dance remains a challenging task. Most dance generation methods primarily rely solely on music, seldom taking into consideration intrinsic attributes such as music style or genre. In this work, we introduce Flexible Dance Generation with Style Description Prompts (DGSDP), a diffusion-based framew… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  6. arXiv:2405.16474  [pdf, other

    cs.LG

    Inaccurate Label Distribution Learning with Dependency Noise

    Authors: Zhiqiang Kou, Jing Wang, Yuheng Jia, Xin Geng

    Abstract: In this paper, we introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning, which arise from dependencies on instances and labels. We start by modeling the inaccurate label distribution matrix as a combination of the true label distribution and a noise matrix influenced by specific instance… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  7. arXiv:2405.13923  [pdf, other

    cs.CL

    Why Not Transform Chat Large Language Models to Non-English?

    Authors: Xiang Geng, Ming Zhu, Jiahuan Li, Zhejian Lai, Wei Zou, Shuaijie She, Jiaxin Guo, Xiaofeng Zhao, Yinglu Li, Yuang Li, Chang Su, Yanqing Zhao, Xinglin Lyu, Min Zhang, Jiajun Chen, Hao Yang, Shujian Huang

    Abstract: The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized fo… ▽ More

    Submitted 31 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  8. MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

    Authors: Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik , et al. (6 additional authors not shown)

    Abstract: Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of down… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, for associated dataset, see http://github.com/microsoft/MS-MARCO-Web-Search

  9. arXiv:2405.06038  [pdf, other

    cs.LG cs.AI

    From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks

    Authors: Xue Geng, Zhe Wang, Chunyun Chen, Qing Xu, Kaixin Xu, Chao Jin, Manas Gupta, Xulei Yang, Zhenghua Chen, Mohamed M. Sabry Aly, Jie Lin, Min Wu, Xiaoli Li

    Abstract: Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research of compress… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: This manuscript is the accepted version for TNNLS(IEEE Transactions on Neural Networks and Learning Systems)

  10. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  11. arXiv:2404.16897  [pdf, other

    cs.LG cs.AI cs.CV

    Exploring Learngene via Stage-wise Weight Sharing for Initializing Variable-sized Models

    Authors: Shi-Yu Xia, Wenxuan Zhu, Xu Yang, Xin Geng

    Abstract: In practice, we usually need to build variable-sized models adapting for diverse resource constraints in different application scenarios, where weight initialization is an important step prior to training. The Learngene framework, introduced recently, firstly learns one compact part termed as learngene from a large well-trained model, after which learngene is expanded to initialize variable-sized… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  12. arXiv:2404.13565  [pdf, other

    cs.CV cs.AI cs.CL

    Exploring Diverse Methods in Visual Question Answering

    Authors: Panfeng Li, Qikai Yang, Xieming Geng, Wenjing Zhou, Zhicheng Ding, Yi Nian

    Abstract: This study explores innovative methods for improving Visual Question Answering (VQA) using Generative Adversarial Networks (GANs), autoencoders, and attention mechanisms. Leveraging a balanced VQA dataset, we investigate three distinct strategies. Firstly, GAN-based approaches aim to generate answer embeddings conditioned on image and question inputs, showing potential but struggling with more com… ▽ More

    Submitted 20 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by 2024 5th International Conference on Electronic Communication and Artificial Intelligence

  13. arXiv:2403.16697  [pdf, other

    cs.CV

    DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization

    Authors: Yunlong Tang, Yuxuan Wan, Lei Qi, Xin Geng

    Abstract: Source-Free Domain Generalization (SFDG) aims to develop a model that works for unseen target domains without relying on any source domain. Recent work, PromptStyler, employs text prompts to simulate different distribution shifts in the joint vision-language space, allowing the model to generalize effectively to unseen domains without using any images. However, 1) PromptStyler's style generation s… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  14. arXiv:2403.14118  [pdf, other

    cs.CL

    From Handcrafted Features to LLMs: A Brief Survey for Machine Translation Quality Estimation

    Authors: Haofei Zhao, Yilun Liu, Shimin Tao, Weibin Meng, Yimeng Chen, Xiang Geng, Chang Su, Min Zhang, Hao Yang

    Abstract: Machine Translation Quality Estimation (MTQE) is the task of estimating the quality of machine-translated text in real time without the need for reference translations, which is of great importance for the development of MT. After two decades of evolution, QE has yielded a wealth of results. This article provides a comprehensive overview of QE datasets, annotation methods, shared tasks, methodolog… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by IJCNN 2024

  15. arXiv:2403.13351  [pdf, other

    cs.CV

    OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning

    Authors: Xinyu Geng, Jiaming Wang, Jiawei Gong, Yuerong Xue, Jun Xu, Fanglin Chen, Xiaolin Huang

    Abstract: Redundancy is a persistent challenge in Capsule Networks (CapsNet),leading to high computational costs and parameter counts. Although previous works have introduced pruning after the initial capsule layer, dynamic routing's fully connected nature and non-orthogonal weight matrices reintroduce redundancy in deeper layers. Besides, dynamic routing requires iterating to converge, further increasing c… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 8 pages

  16. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  17. arXiv:2402.19145  [pdf, other

    cs.CV

    A SAM-guided Two-stream Lightweight Model for Anomaly Detection

    Authors: Chenghao Li, Lei Qi, Xin Geng

    Abstract: In industrial anomaly detection, model efficiency and mobile-friendliness become the primary concerns in real-world applications. Simultaneously, the impressive generalization capabilities of Segment Anything (SAM) have garnered broad academic attention, making it an ideal choice for localizing unseen anomalies and diverse real-world patterns. In this paper, considering these two critical factors,… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  18. arXiv:2401.13011  [pdf, other

    cs.CV

    CCA: Collaborative Competitive Agents for Image Editing

    Authors: Tiankai Hang, Shuyang Gu, Dong Chen, Xin Geng, Baining Guo

    Abstract: This paper presents a novel generative model, Collaborative Competitive Agents (CCA), which leverages the capabilities of multiple Large Language Models (LLMs) based agents to execute complex tasks. Drawing inspiration from Generative Adversarial Networks (GANs), the CCA system employs two equal-status generator agents and a discriminator agent. The generators independently process user instructio… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  19. arXiv:2401.08139  [pdf, other

    cs.LG cs.NE

    Transferring Core Knowledge via Learngenes

    Authors: Fu Feng, Jing Wang, Xin Geng

    Abstract: The pre-training paradigm fine-tunes the models trained on large-scale datasets to downstream tasks with enhanced performance. It transfers all knowledge to downstream tasks without discriminating which part is necessary or unnecessary, which may lead to negative transfer. In comparison, knowledge transfer in nature is much more efficient. When passing genetic information to descendants, ancestors… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  20. arXiv:2401.06838  [pdf, other

    cs.CL

    MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization

    Authors: Shuaijie She, Wei Zou, Shujian Huang, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen

    Abstract: Though reasoning abilities are considered language-agnostic, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data. To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimizatio… ▽ More

    Submitted 13 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: The project is available at https://github.com/NJUNLP/MAPO

  21. arXiv:2401.06568  [pdf, other

    cs.CL cs.AI

    Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation

    Authors: Xu Huang, Zhirui Zhang, Xiang Geng, Yichao Du, Jiajun Chen, Shujian Huang

    Abstract: This study investigates how Large Language Models (LLMs) leverage source and reference data in machine translation evaluation task, aiming to better understand the mechanisms behind their remarkable performance in this task. We design the controlled experiments across various input modes and model types, and employ both coarse-grained and fine-grained prompts to discern the utility of source versu… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted by ACL2024 Findings

  22. arXiv:2312.15156  [pdf, other

    cs.CL

    Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study

    Authors: Mingyang Song, Xuelian Geng, Songfang Yao, Shilong Lu, Yi Feng, Liping Jing

    Abstract: Zero-shot keyphrase extraction aims to build a keyphrase extractor without training by human-annotated data, which is challenging due to the limited human intervention involved. Challenging but worthwhile, zero-shot setting efficiently reduces the time and effort that data labeling takes. Recent efforts on pre-trained large language models (e.g., ChatGPT and ChatGLM) show promising performance on… ▽ More

    Submitted 10 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Technical Report, 6 pages

  23. arXiv:2312.09881  [pdf, other

    cs.LG cs.AI

    Dynamic Heterogeneous Federated Learning with Multi-Level Prototypes

    Authors: Shunxin Guo, Hongsong Wang, Xin Geng

    Abstract: Federated learning shows promise as a privacy-preserving collaborative learning technique. Existing heterogeneous federated learning mainly focuses on skewing the label distribution across clients. However, most approaches suffer from catastrophic forgetting and concept drift, mainly when the global distribution of all classes is extremely unbalanced and the data distribution of the client dynamic… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  24. arXiv:2312.06343  [pdf, other

    cs.LG

    RankMatch: A Novel Approach to Semi-Supervised Label Distribution Learning Leveraging Inter-label Correlations

    Authors: Kouzhiqiang Yucheng Xie, Jing Wang, Yuheng Jia, Boyu Shi, Xin Geng

    Abstract: This paper introduces RankMatch, an innovative approach for Semi-Supervised Label Distribution Learning (SSLDL). Addressing the challenge of limited labeled data, RankMatch effectively utilizes a small number of labeled examples in conjunction with a larger quantity of unlabeled data, reducing the need for extensive manual labeling in Deep Neural Network (DNN) applications. Specifically, RankMatch… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  25. arXiv:2312.05743  [pdf, other

    cs.LG cs.CV

    Building Variable-sized Models via Learngene Pool

    Authors: Boyu Shi, Shiyu Xia, Xu Yang, Haokun Chen, Zhiqiang Kou, Xin Geng

    Abstract: Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-trained networks for quickly building numerous networks with different complexity and performance trade-offs. In this way, the burdens of designing or training the variable-sized networks, which can be used in application scenarios with diverse resource constraints, are alleviated. However, SN-Net still faces a few challe… ▽ More

    Submitted 11 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

  26. arXiv:2312.05614  [pdf, other

    cs.AI cs.LG

    Transformer as Linear Expansion of Learngene

    Authors: Shiyu Xia, Miaosen Zhang, Xu Yang, Ruiming Chen, Haokun Chen, Xin Geng

    Abstract: We propose expanding the shared Transformer module to produce and initialize Transformers of varying depths, enabling adaptation to diverse resource constraints. Drawing an analogy to genetic expansibility, we term such module as learngene. To identify the expansion mechanism, we delve into the relationship between the layer's position and its corresponding weight value, and find that linear funct… ▽ More

    Submitted 20 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

  27. arXiv:2312.01598  [pdf, other

    cs.CV

    Good Questions Help Zero-Shot Image Reasoning

    Authors: Kaiwen Yang, Tao Shen, Xinmei Tian, Xiubo Geng, Chongyang Tao, Dacheng Tao, Tianyi Zhou

    Abstract: Aligning the recent large language models (LLMs) with computer vision models leads to large vision-language models (LVLMs), which have paved the way for zero-shot image reasoning tasks. However, LVLMs are usually trained on short high-level captions only referring to sparse focus regions in images. Such a ``tunnel vision'' limits LVLMs to exploring other relevant contexts in complex scenes. To add… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

  28. arXiv:2312.00785  [pdf, other

    cs.CV

    Sequential Modeling Enables Scalable Learning for Large Vision Models

    Authors: Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros

    Abstract: We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions without needing any meta-knowledge beyond the pixels. Once… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: Website: https://yutongbai.com/lvm.html

  29. arXiv:2312.00351  [pdf, other

    cs.CV

    Manipulating the Label Space for In-Context Classification

    Authors: Haokun Chen, Xu Yang, Yuhang Huang, Zihan Wu, Jing Wang, Xin Geng

    Abstract: After pre-training by generating the next word conditional on previous words, the Language Model (LM) acquires the ability of In-Context Learning (ICL) that can learn a new task conditional on the context of the given in-context examples (ICEs). Similarly, visually-conditioned Language Modelling is also used to train Vision-Language Models (VLMs) with ICL ability. However, such VLMs typically exhi… ▽ More

    Submitted 5 December, 2023; v1 submitted 30 November, 2023; originally announced December 2023.

  30. arXiv:2311.16556  [pdf, other

    cs.LG

    Scalable Label Distribution Learning for Multi-Label Classification

    Authors: Xingyu Zhao, Yuexuan An, Lei Qi, Xin Geng

    Abstract: Multi-label classification (MLC) refers to the problem of tagging a given instance with a set of relevant labels. Most existing MLC methods are based on the assumption that the correlation of two labels in each label pair is symmetric, which is violated in many real-world scenarios. Moreover, most existing methods design learning processes associated with the number of labels, which makes their co… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  31. arXiv:2311.13198  [pdf, other

    cs.CV

    DoubleAUG: Single-domain Generalized Object Detector in Urban via Color Perturbation and Dual-style Memory

    Authors: Lei Qi, Peng Dong, Tan Xiong, Hui Xue, Xin Geng

    Abstract: Object detection in urban scenarios is crucial for autonomous driving in intelligent traffic systems. However, unlike conventional object detection tasks, urban-scene images vary greatly in style. For example, images taken on sunny days differ significantly from those taken on rainy days. Therefore, models trained on sunny day images may not generalize well to rainy day images. In this paper, we a… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Accepted by ACM Transactions on Multimedia Computing, Communications, and Applications

  32. arXiv:2311.08734  [pdf, other

    cs.CL

    Thread of Thought Unraveling Chaotic Contexts

    Authors: Yucheng Zhou, Xiubo Geng, Tao Shen, Chongyang Tao, Guodong Long, Jian-Guang Lou, Jianbing Shen

    Abstract: Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In r… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 11 pages, 7 figures, 5 tables

  33. arXiv:2310.19491  [pdf, ps, other

    math.ST cs.LG stat.ML

    Generator Identification for Linear SDEs with Additive and Multiplicative Noise

    Authors: Yuanyuan Wang, Xi Geng, Wei Huang, Biwei Huang, Mingming Gong

    Abstract: In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifica… ▽ More

    Submitted 21 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

  34. arXiv:2310.11731  [pdf, other

    cs.AI

    Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning

    Authors: Jianlan Luo, Perry Dong, Jeffrey Wu, Aviral Kumar, Xinyang Geng, Sergey Levine

    Abstract: The offline reinforcement learning (RL) paradigm provides a general recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. While policy constraints, conservatism, and other methods for mitigating distributional shifts have made offline reinforcement learning more effective, the continuous action setting often necessitates various a… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  35. arXiv:2310.10056  [pdf, other

    cs.LG

    Latent Conservative Objective Models for Data-Driven Crystal Structure Prediction

    Authors: Han Qi, Xinyang Geng, Stefano Rando, Iku Ohama, Aviral Kumar, Sergey Levine

    Abstract: In computational chemistry, crystal structure prediction (CSP) is an optimization problem that involves discovering the lowest energy stable crystal structure for a given chemical formula. This problem is challenging as it requires discovering globally optimal designs with the lowest energies on complex manifolds. One approach to tackle this problem involves building simulators based on density fu… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  36. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  37. arXiv:2309.13886  [pdf, other

    cs.LG

    Can Class-Priors Help Single-Positive Multi-Label Learning?

    Authors: Biao Liu, Ning Xu, Jie Wang, Xin Geng

    Abstract: Single-positive multi-label learning (SPMLL) is a typical weakly supervised multi-label learning problem, where each training example is annotated with only one positive label. Existing SPMLL methods typically assign pseudo-labels to unannotated labels with the assumption that prior probabilities of all classes are identical. However, the class-prior of each category may differ significantly in re… ▽ More

    Submitted 26 May, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

  38. arXiv:2309.13230  [pdf, other

    cs.CL

    Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task

    Authors: Xiang Geng, Zhejian Lai, Yu Zhang, Shimin Tao, Hao Yang, Jiajun Chen, Shujian Huang

    Abstract: We introduce the submissions of the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks: (i) sentence- and word-level quality prediction; and (ii) fine-grained error span detection. This year, we further explore pseudo data methods for QE based on NJUQE framework (https://github.com/NJUNLP/njuqe).… ▽ More

    Submitted 11 December, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: WMT2023 System Paper

    Journal ref: https://aclanthology.org/2023.wmt-1.71

  39. arXiv:2308.16718  [pdf, other

    cs.LG

    Robust Representation Learning for Unreliable Partial Label Learning

    Authors: Yu Shi, Dong-Dong Wu, Xin Geng, Min-Ling Zhang

    Abstract: Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth. However, this idealistic assumption may not always hold due to potential annotation inaccuracies, meaning the ground-truth may not be present in the candidate label set. This is known as Unreliable Partial Label Learning (U… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

  40. arXiv:2308.10438  [pdf, other

    cs.CV

    Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep Neural Networks

    Authors: Kaixin Xu, Zhe Wang, Xue Geng, Jie Lin, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: In this paper, we propose a novel layer-adaptive weight-pruning approach for Deep Neural Networks (DNNs) that addresses the challenge of optimizing the output distortion minimization while adhering to a target pruning ratio constraint. Our approach takes into account the collective influence of all layers to design a layer-adaptive pruning scheme. We discover and utilize a very important additivit… ▽ More

    Submitted 24 August, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

  41. arXiv:2308.09583  [pdf, other

    cs.CL cs.AI cs.LG

    WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

    Authors: Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Dongmei Zhang

    Abstract: Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: LLM, Mathematical Reasoning

  42. arXiv:2308.01742  [pdf, other

    cs.LG

    Exploiting Multi-Label Correlation in Label Distribution Learning

    Authors: Zhiqiang Kou jing wang yuheng jia xin geng

    Abstract: Label Distribution Learning (LDL) is a novel machine learning paradigm that assigns label distribution to each instance. Many LDL methods proposed to leverage label correlation in the learning process to solve the exponential-sized output space; among these, many exploited the low-rank structure of label distribution to capture label correlation. However, recent studies disclosed that label distri… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  43. arXiv:2308.00918  [pdf, other

    cs.CV

    A Novel Cross-Perturbation for Single Domain Generalization

    Authors: Dongjia Zhao, Lei Qi, Xiao Shi, Yinghuan Shi, Xin Geng

    Abstract: Single domain generalization aims to enhance the ability of the model to generalize to unknown domains when trained on a single source domain. However, the limited diversity in the training data hampers the learning of domain-invariant features, resulting in compromised generalization performance. To address this, data perturbation (augmentation) has emerged as a crucial method to increase data di… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 August, 2023; originally announced August 2023.

    Comments: Accepted by IEEE TCSVT

  44. arXiv:2308.00529  [pdf, other

    cs.LG cs.AR

    Variational Label-Correlation Enhancement for Congestion Prediction

    Authors: Biao Liu, Congyu Qiao, Ning Xu, Xin Geng, Ziran Zhu, Jun Yang

    Abstract: The physical design process of large-scale designs is a time-consuming task, often requiring hours to days to complete, with routing being the most critical and complex step. As the the complexity of Integrated Circuits (ICs) increases, there is an increased demand for accurate routing quality prediction. Accurate congestion prediction aids in identifying design flaws early on, thereby acceleratin… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  45. arXiv:2307.15411  [pdf, other

    cs.CL

    Investigating the Learning Behaviour of In-context Learning: A Comparison with Supervised Learning

    Authors: Xindi Wang, Yufei Wang, Can Xu, Xiubo Geng, Bowen Zhang, Chongyang Tao, Frank Rudzicz, Robert E. Mercer, Daxin Jiang

    Abstract: Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL), where learning a new task from just a few training examples is done without being explicitly pre-trained. However, despite the success of LLMs, there has been little understanding of how ICL learns the knowledge from the given prompts. In this paper, to make progress toward understanding the learning behavio… ▽ More

    Submitted 1 August, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

    Comments: accepted to ECAI 2023 (camera-ready)

  46. arXiv:2307.13492  [pdf, other

    cs.CV

    NormAUG: Normalization-guided Augmentation for Domain Generalization

    Authors: Lei Qi, Hongpeng Yang, Yinghuan Shi, Xin Geng

    Abstract: Deep learning has made significant advancements in supervised learning. However, models trained in this setting often face challenges due to domain shift between training and test sets, resulting in a significant drop in performance during testing. To address this issue, several domain generalization methods have been developed to learn robust and domain-invariant features from multiple training d… ▽ More

    Submitted 6 February, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: Accepted by IEEE Transactions on Image Processing (TIP)

  47. arXiv:2307.08927  [pdf, other

    cs.RO cs.AI

    Multi-Stage Cable Routing through Hierarchical Imitation Learning

    Authors: Jianlan Luo, Charles Xu, Xinyang Geng, Gilbert Feng, Kuan Fang, Liam Tan, Stefan Schaal, Sergey Levine

    Abstract: We study the problem of learning to perform multi-stage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multi-stage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of m… ▽ More

    Submitted 13 January, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: T-RO 2024

  48. arXiv:2306.11991  [pdf, other

    cs.CV

    Generalizable Metric Network for Cross-domain Person Re-identification

    Authors: Lei Qi, Ziang Liu, Yinghuan Shi, Xin Geng

    Abstract: Person Re-identification (Re-ID) is a crucial technique for public security and has made significant progress in supervised settings. However, the cross-domain (i.e., domain generalization) scene presents a challenge in Re-ID tasks due to unseen test domains and domain-shift between the training and test sets. To tackle this challenge, most existing methods aim to learn domain-invariant or robust… ▽ More

    Submitted 29 April, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted by IEEE TCSVT

  49. arXiv:2306.10225  [pdf, other

    cs.NE cs.AI cs.LG

    Genes in Intelligent Agents

    Authors: Fu Feng, Jing Wang, Xu Yang, Xin Geng

    Abstract: The genes in nature give the lives on earth the current biological intelligence through transmission and accumulation over billions of years. Inspired by the biological intelligence, artificial intelligence (AI) has devoted to building the machine intelligence. Although it has achieved thriving successes, the machine intelligence still lags far behind the biological intelligence. The reason may li… ▽ More

    Submitted 27 October, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

  50. arXiv:2306.08568  [pdf, other

    cs.CL cs.AI

    WizardCoder: Empowering Code Large Language Models with Evol-Instruct

    Authors: Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang

    Abstract: Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Large Language model, Code Generation, Code LLMs