Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 224 results for author: You, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01646  [pdf, other

    cs.SE cs.AI

    ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization

    Authors: Chunrong Fang, Weisong Sun, Yuchen Chen, Xiao Chen, Zhao Wei, Quanjun Zhang, Yudu You, Bin Luo, Yang Liu, Zhenyu Chen

    Abstract: (Source) code summarization aims to automatically generate succinct natural language summaries for given code snippets. Such summaries play a significant role in promoting developers to understand and maintain code. Inspired by neural machine translation, deep learning-based code summarization techniques widely adopt an encoder-decoder framework, where the encoder transforms given code snippets in… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted to IEEE Transactions on Software Engineering (TSE)

    MSC Class: 68-04 ACM Class: D.2.3; I.2.7

  2. arXiv:2407.00611  [pdf, other

    cs.DC

    WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem

    Authors: Ziming Liu, Shaoyu Wang, Shenggan Cheng, Zhongkai Zhao, Xuanlei Zhao, James Demmel, Yang You

    Abstract: In recent years, Transformer-based Large Language Models (LLMs) have garnered significant attention due to their exceptional performance across a variety of tasks. However, training these models on long sequences presents a substantial challenge in terms of efficiency and scalability. Current methods are constrained either by the number of attention heads, limiting scalability, or by excessive com… ▽ More

    Submitted 1 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  3. arXiv:2406.18301  [pdf, other

    eess.AS cs.CL cs.SD

    MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research

    Authors: Song Li, Yongbin You, Xuezhi Wang, Zhengkun Tian, Ke Ding, Guanglu Wan

    Abstract: Recently, multilingual artificial intelligence assistants, exemplified by ChatGPT, have gained immense popularity. As a crucial gateway to human-computer interaction, multilingual automatic speech recognition (ASR) has also garnered significant attention, as evidenced by systems like Whisper. However, the proprietary nature of the training data has impeded researchers' efforts to study multilingua… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  4. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.08845  [pdf, other

    cs.CV

    Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

    Authors: Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang

    Abstract: Recent text-to-video (T2V) technology advancements, as demonstrated by models such as Gen2, Pika, and Sora, have significantly broadened its applicability and popularity. Despite these strides, evaluating these models poses substantial challenges. Primarily, due to the limitations inherent in automatic metrics, manual evaluation is often considered a superior method for assessing T2V generation. H… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  6. arXiv:2406.06565  [pdf, other

    cs.CL cs.AI cs.LG

    MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

    Authors: Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, Yang You

    Abstract: Evaluating large language models (LLMs) is challenging. Traditional ground-truth-based benchmarks fail to capture the comprehensiveness and nuance of real-world queries, while LLM-as-judge benchmarks suffer from grading biases and limited query quantity. Both of them may also become contaminated over time. User-facing evaluation, such as Chatbot Arena, provides reliable signals but is costly and s… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  7. arXiv:2406.01179  [pdf, other

    cs.CL cs.AI

    Are AI-Generated Text Detectors Robust to Adversarial Perturbations?

    Authors: Guanhua Huang, Yuchen Zhang, Zhe Li, Yongjian You, Mingze Wang, Zhouwang Yang

    Abstract: The widespread use of large language models (LLMs) has sparked concerns about the potential misuse of AI-generated text, as these models can produce content that closely resembles human-generated text. Current detectors for AI-generated text (AIGT) lack robustness against adversarial perturbations, with even minor changes in characters or words causing a reversal in distinguishing between human-cr… ▽ More

    Submitted 26 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main conference

  8. arXiv:2405.18347  [pdf, other

    cs.LG

    Dataset Growth

    Authors: Ziheng Qin, Zhaopan Xu, Yukun Zhou, Zangwei Zheng, Zebang Cheng, Hao Tang, Lei Shang, Baigui Sun, Xiaojiang Peng, Radu Timofte, Hongxun Yao, Kai Wang, Yang You

    Abstract: Deep learning benefits from the growing abundance of available data. Meanwhile, efficiently dealing with the growing data scale has become a challenge. Data publicly available are from different sources with various qualities, and it is impractical to do manual cleaning against noise and redundancy given today's data scale. There are existing techniques for cleaning/selecting the collected data. H… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  9. arXiv:2405.17403  [pdf, other

    cs.LG cs.AI

    A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

    Authors: Kai Wang, Yukun Zhou, Mingjia Shi, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You

    Abstract: Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps can be empirically divided into acceleration, deceleration, and convergence areas based on the process increment. ii) These time steps are imbalanced, with many… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    ACM Class: I.2

  10. arXiv:2405.16034  [pdf, other

    cs.CV

    DiffuBox: Refining 3D Object Detection with Point Diffusion

    Authors: Xiangyu Chen, Zhenzhen Liu, Katie Z Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li, Marco Pavone, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q. Weinberger

    Abstract: Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  11. arXiv:2405.07623  [pdf, other

    cs.CL

    COBias and Debias: Minimizing Language Model Pairwise Accuracy Bias via Nonlinear Integer Programming

    Authors: Ruixi Lin, Yang You

    Abstract: For language model classification, would you prefer having only one workable class or having every class working? The latter makes more practical uses. Especially for large language models (LLMs), the fact that they achieve a fair overall accuracy by in-context learning (ICL) obscures a large difference in individual class accuracies. In this work, we uncover and tackle language models' imbalance… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  12. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  13. arXiv:2405.03685  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Language-Image Models with 3D Understanding

    Authors: Jang Hyun Cho, Boris Ivanovic, Yulong Cao, Edward Schmerling, Yue Wang, Xinshuo Weng, Boyi Li, Yurong You, Philipp Krähenbühl, Yan Wang, Marco Pavone

    Abstract: Multi-modal large language models (MLLMs) have shown incredible capabilities in a variety of 2D vision and language tasks. We extend MLLMs' perceptual capabilities to ground and reason about images in 3-dimensional space. To that end, we first develop a large-scale pre-training dataset for 2D and 3D called LV3D by combining multiple existing 2D and 3D recognition datasets under a common task formu… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Project page: https://janghyuncho.github.io/Cube-LLM

  14. arXiv:2405.03520  [pdf, other

    cs.CV

    Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

    Authors: Zheng Zhu, Xiaofeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang

    Abstract: General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems. Recently, the emergence of the Sora model has attained significant attention due to its remarkable simulation capabilities, which exhibits an incipient comprehension of physical law… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: This survey will be regularly updated at: https://github.com/GigaAI-research/General-World-Models-Survey

  15. arXiv:2404.12866  [pdf, other

    cs.CL cs.AI cs.CV

    How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?

    Authors: Yang Luo, Zangwei Zheng, Zirui Zhu, Yang You

    Abstract: The increase in parameter size of multimodal large language models (MLLMs) introduces significant capabilities, particularly in-context learning, where MLLMs enhance task performance without updating pre-trained parameters. This effectiveness, however, hinges on the appropriate selection of in-context examples, a process that is currently biased towards visual data, overlooking textual information… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  16. arXiv:2404.12785  [pdf, other

    cs.RO

    AutoInspect: Towards Long-Term Autonomous Industrial Inspection

    Authors: Michal Staniaszek, Tobit Flatscher, Joseph Rowell, Hanlin Niu, Wenxing Liu, Yang You, Robert Skilton, Maurice Fallon, Nick Hawes

    Abstract: We give an overview of AutoInspect, a ROS-based software system for robust and extensible mission-level autonomy. Over the past three years AutoInspect has been deployed in a variety of environments, including at a mine, a chemical plant, a mock oil rig, decommissioned nuclear power plants, and a fusion reactor for durations ranging from hours to weeks. The system combines robust mapping and local… ▽ More

    Submitted 23 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024

  17. arXiv:2404.05139  [pdf, other

    cs.CV cs.RO

    Better Monocular 3D Detectors with LiDAR from the Past

    Authors: Yurong You, Cheng Perng Phoo, Carlos Andres Diaz-Ruiz, Katie Z Luo, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q Weinberger

    Abstract: Accurate 3D object detection is crucial to autonomous driving. Though LiDAR-based detectors have achieved impressive performance, the high cost of LiDAR sensors precludes their widespread adoption in affordable vehicles. Camera-based detectors are cheaper alternatives but often suffer inferior performance compared to their LiDAR-based counterparts due to inherent depth ambiguities in images. In th… ▽ More

    Submitted 9 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by ICRA 2024. The code can be found at https://github.com/YurongYou/AsyncDepth

  18. arXiv:2403.16023  [pdf, other

    cs.RO cs.AI cs.CV

    RPMArt: Towards Robust Perception and Manipulation for Articulated Objects

    Authors: Junbo Wang, Wenhai Liu, Qiaojun Yu, Yang You, Liu Liu, Weiming Wang, Cewu Lu

    Abstract: Articulated objects are commonly found in daily life. It is essential that robots can exhibit robust perception and manipulation skills for articulated objects in real-world robotic applications. However, existing methods for articulated objects insufficiently address noise in point clouds and struggle to bridge the gap between simulation and reality, thus limiting the practical deployment in real… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), project website at https://r-pmart.github.io

  19. arXiv:2403.13365  [pdf, other

    cs.RO cs.CV

    ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics

    Authors: Qiaojun Yu, Ce Hao, Junbo Wang, Wenhai Liu, Liu Liu, Yao Mu, Yang You, Hengxu Yan, Cewu Lu

    Abstract: Robotic manipulation in everyday scenarios, especially in unstructured environments, requires skills in pose-aware object manipulation (POM), which adapts robots' grasping and handling according to an object's 6D pose. Recognizing an object's position and orientation is crucial for effective manipulation. For example, if a mug is lying on its side, it's more effective to grasp it by the rim rather… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  20. arXiv:2403.11808  [pdf, other

    cs.CV

    Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

    Authors: Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You

    Abstract: Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency. However, the exploration of enhancing inference efficiency during adaptation remains underexplored. This limits the broader application of pre-trained ViT models, especially when the model is computationally extensive. In this paper,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  21. arXiv:2403.10266  [pdf, other

    cs.DC cs.LG

    DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers

    Authors: Xuanlei Zhao, Shenggan Cheng, Chang Chen, Zangwei Zheng, Ziming Liu, Zheming Yang, Yang You

    Abstract: Scaling multi-dimensional transformers to long sequences is indispensable across various domains. However, the challenges of large memory requirements and slow speeds of such sequences necessitate sequence parallelism. All existing approaches fall under the category of embedded sequence parallelism, which are limited to shard along a single sequence dimension, thereby introducing significant commu… ▽ More

    Submitted 27 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  22. arXiv:2403.01164  [pdf, other

    cs.PF cs.DC

    HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices

    Authors: Xuanlei Zhao, Bin Jia, Haotian Zhou, Ziming Liu, Shenggan Cheng, Yang You

    Abstract: In recent times, the emergence of Large Language Models (LLMs) has resulted in increasingly larger model size, posing challenges for inference on low-resource devices. Prior approaches have explored offloading to facilitate low-memory inference but often suffer from efficiency due to I/O bottlenecks. To achieve low-latency LLMs inference on resource-constrained devices, we introduce HeteGen, a nov… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: MLSys 2024

  23. Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization

    Authors: Zirui Zhu, Yong Liu, Zangwei Zheng, Huifeng Guo, Yang You

    Abstract: Click-Through Rate (CTR) prediction holds paramount significance in online advertising and recommendation scenarios. Despite the proliferation of recent CTR prediction models, the improvements in performance have remained limited, as evidenced by open-source benchmark assessments. Current researchers tend to focus on developing new models for various datasets and settings, often neglecting a cruci… ▽ More

    Submitted 23 February, 2024; originally announced March 2024.

    Comments: Proceedings of the ACM Web Conference 2024 (WWW '24)

  24. arXiv:2402.17911  [pdf, other

    quant-ph cond-mat.stat-mech cs.IT cs.LG

    Demonstration of Robust and Efficient Quantum Property Learning with Shallow Shadows

    Authors: Hong-Ye Hu, Andi Gu, Swarnadeep Majumder, Hang Ren, Yipei Zhang, Derek S. Wang, Yi-Zhuang You, Zlatko Minev, Susanne F. Yelin, Alireza Seif

    Abstract: Extracting information efficiently from quantum systems is a major component of quantum information processing tasks. Randomized measurements, or classical shadows, enable predicting many properties of arbitrary quantum states using few measurements. While random single qubit measurements are experimentally friendly and suitable for learning low-weight Pauli observables, they perform poorly for no… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 5 figures

  25. arXiv:2402.15751  [pdf, other

    cs.LG cs.AI cs.CL

    Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

    Authors: Yong Liu, Zirui Zhu, Chaoyu Gong, Minhao Cheng, Cho-Jui Hsieh, Yang You

    Abstract: While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, the quality of gradient est… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  26. arXiv:2402.13144  [pdf, other

    cs.LG cs.CV

    Neural Network Parameter Diffusion

    Authors: Kai Wang, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, Yang You

    Abstract: Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also \textit{generate high-performing neural network parameters}. Our approach is simple, utilizing an autoencoder and a standard latent diffusion model. The autoencoder extracts latent representations of a subset of the trained network parameters. A diffusion mod… ▽ More

    Submitted 28 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: We introduce a novel approach for parameter generation, named neural network parameter diffusion (\textbf{p-diff}), which employs a standard latent diffusion model to synthesize a new set of parameters

  27. arXiv:2402.10227  [pdf, other

    cs.LG stat.ML

    Correlational Lagrangian Schrödinger Bridge: Learning Dynamics with Population-Level Regularization

    Authors: Yuning You, Ruida Zhou, Yang Shen

    Abstract: Accurate modeling of system dynamics holds intriguing potential in broad scientific fields including cytodynamics and fluid mechanics. This task often presents significant challenges when (i) observations are limited to cross-sectional samples (where individual trajectories are inaccessible for learning), and moreover, (ii) the behaviors of individual particles are heterogeneous (especially in bio… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  28. arXiv:2402.05131  [pdf, ps, other

    cs.CL

    Financial Report Chunking for Effective Retrieval Augmented Generation

    Authors: Antonio Jimeno Yepes, Yao You, Jan Milczek, Sebastian Laverde, Renyu Li

    Abstract: Chunking information is a key step in Retrieval Augmented Generation (RAG). Current research primarily centers on paragraph-level chunking. This approach treats all texts as equal and neglects the information contained in the structure of documents. We propose an expanded approach to chunk documents by moving beyond mere paragraph-level chunking to chunk primary by structural element components of… ▽ More

    Submitted 16 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  29. arXiv:2402.05011  [pdf, other

    cs.LG

    Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching

    Authors: Yuchen Zhang, Tianle Zhang, Kai Wang, Ziyao Guo, Yuxuan Liang, Xavier Bresson, Wei Jin, Yang You

    Abstract: Graph condensation aims to reduce the size of a large-scale graph dataset by synthesizing a compact counterpart without sacrificing the performance of Graph Neural Networks (GNNs) trained on it, which has shed light on reducing the computational cost for training GNNs. Nevertheless, existing methods often fall short of accurately replicating the original graph for certain datasets, thereby failing… ▽ More

    Submitted 18 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Lossless graph condensation method

  30. arXiv:2402.04924  [pdf, other

    cs.LG

    Two Trades is not Baffled: Condensing Graph via Crafting Rational Gradient Matching

    Authors: Tianle Zhang, Yuchen Zhang, Kun Wang, Kai Wang, Beining Yang, Kaipeng Zhang, Wenqi Shao, Ping Liu, Joey Tianyi Zhou, Yang You

    Abstract: Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have raised growing concerns. As one of the most promising directions, graph condensation methods address these issues by employing gradient matching, aiming to condense the full graph into a more concise yet information-rich synthetic set. Though encouraging, these strategies… ▽ More

    Submitted 30 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: An effective method for graph condensation

  31. arXiv:2402.03610  [pdf, other

    cs.LG cs.AI cs.CL

    RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents

    Authors: Tomoyuki Kagaya, Thong Jing Yuan, Yuxuan Lou, Jayashree Karlekar, Sugiri Pranata, Akira Kinose, Koki Oguri, Felix Wick, Yang You

    Abstract: Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  32. arXiv:2402.02339  [pdf, other

    cs.CV cs.AI cs.LG

    Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation

    Authors: Ti Wang, Mengyuan Liu, Hong Liu, Bin Ren, Yingxuan You, Wenhao Li, Nicu Sebe, Xia Li

    Abstract: Although data-driven methods have achieved success in 3D human pose estimation, they often suffer from domain gaps and exhibit limited generalization. In contrast, optimization-based methods excel in fine-tuning for specific cases but are generally inferior to data-driven methods in overall performance. We observe that previous optimization-based methods commonly rely on projection constraint, whi… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  33. arXiv:2402.02082  [pdf, other

    cs.CL

    GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

    Authors: Cunxiao Du, Jing Jiang, Xu Yuanchen, Jiawei Wu, Sicheng Yu, Yongqi Li, Shenggui Li, Kai Xu, Liqiang Nie, Zhaopeng Tu, Yang You

    Abstract: Speculative decoding is a relatively new decoding framework that leverages small and efficient draft models to reduce the latency of LLMs. In this study, we introduce GliDe and CaPE, two low-hassle modifications to vanilla speculative decoding to further improve the decoding speed of a frozen LLM. Specifically, GliDe is a modified draft model architecture that reuses the cached keys and values fro… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  34. arXiv:2402.01739  [pdf, other

    cs.CL cs.AI cs.DC cs.LG

    OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

    Authors: Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You

    Abstract: To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens. Our investigation confirms that MoE-based LLMs can offer a more favorable cost-effectiveness trade-o… ▽ More

    Submitted 27 March, 2024; v1 submitted 29 January, 2024; originally announced February 2024.

  35. arXiv:2401.10652  [pdf, other

    cs.PF cs.DC cs.LG

    AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference

    Authors: Xuanlei Zhao, Shenggan Cheng, Guangyang Lu, Jiarui Fang, Haotian Zhou, Bin Jia, Ziming Liu, Yang You

    Abstract: Large deep learning models have achieved impressive performance across a range of applications. However, their large memory requirements, including parameter memory and activation memory, have become a significant challenge for their practical serving. While existing methods mainly address parameter memory, the importance of activation memory has been overlooked. Especially for long input sequence… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  36. arXiv:2401.08140  [pdf, other

    cs.CV

    ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Process

    Authors: Kiyohiro Nakayama, Mikaela Angelina Uy, Yang You, Ke Li, Leonidas Guibas

    Abstract: Neural radiance fields (NeRFs) have gained popularity across various applications. However, they face challenges in the sparse view setting, lacking sufficient constraints from volume rendering. Reconstructing and understanding a 3D scene from sparse and unconstrained cameras is a long-standing problem in classical computer vision with diverse applications. While recent works have explored NeRFs i… ▽ More

    Submitted 18 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  37. arXiv:2401.07543  [pdf, other

    cs.CE cs.AI

    Must: Maximizing Latent Capacity of Spatial Transcriptomics Data

    Authors: Zelin Zang, Liangyu Li, Yongjie Xu, Chenrui Duan, Kai Wang, Yang You, Yi Sun, Stan Z. Li

    Abstract: Spatial transcriptomics (ST) technologies have revolutionized the study of gene expression patterns in tissues by providing multimodality data in transcriptomic, spatial, and morphological, offering opportunities for understanding tissue biology beyond transcriptomics. However, we identify the modality bias phenomenon in ST data species, i.e., the inconsistent contribution of different modalities… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 30 pages and 6 figures, plus 27 pages and 14 figures in appendices

  38. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  39. arXiv:2312.16066  [pdf, other

    cs.SE cs.AI

    A Prompt Learning Framework for Source Code Summarization

    Authors: Weisong Sun, Chunrong Fang, Yudu You, Yuchen Chen, Yi Liu, Chong Wang, Jian Zhang, Quanjun Zhang, Hanwei Qian, Wei Zhao, Yang Liu, Zhenyu Chen

    Abstract: (Source) code summarization is the task of automatically generating natural language summaries for given code snippets. Such summaries play a key role in helping developers understand and maintain source code. Recently, with the successful application of large language models (LLMs) in numerous fields, software engineering researchers have also attempted to adapt LLMs to solve code summarization t… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: submitted to ACM Transactions on Software Engineering and Methodology

    MSC Class: 68-04; 68T30 ACM Class: D.2.3; I.2.2; I.2.4

  40. arXiv:2312.15130  [pdf, other

    cs.CV

    PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments

    Authors: Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W. Harley, Leonidas Guibas, Cewu Lu

    Abstract: Pose estimation is a crucial task in computer vision and robotics, enabling the tracking and manipulation of objects in images or videos. While several datasets exist for pose estimation, there is a lack of large-scale datasets specifically focusing on cluttered scenes with occlusions. We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the d… ▽ More

    Submitted 31 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

  41. arXiv:2312.10714  [pdf, other

    cs.CV

    Primitive-based 3D Human-Object Interaction Modelling and Programming

    Authors: Siqi Liu, Yong-Lu Li, Zhou Fang, Xinpeng Liu, Yang You, Cewu Lu

    Abstract: Embedding Human and Articulated Object Interaction (HAOI) in 3D is an important direction for a deeper human activity understanding. Different from previous works that use parametric and CAD models to represent humans and objects, in this work, we propose a novel 3D geometric primitive-based language to encode both humans and objects. Given our new paradigm, humans and objects are all compositions… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: AAAI2024

  42. arXiv:2312.03291  [pdf, other

    cs.LG cs.AI

    OMNIINPUT: A Model-centric Evaluation Framework through Output Distribution

    Authors: Weitang Liu, Ying Wai Li, Tianle Wang, Yi-Zhuang You, Jingbo Shang

    Abstract: We propose a novel model-centric evaluation framework, OmniInput, to evaluate the quality of an AI/ML model's predictions on all possible inputs (including human-unrecognizable ones), which is crucial for AI safety and reliability. Unlike traditional data-centric evaluation based on pre-defined test sets, the test set in OmniInput is self-constructed by the model itself and the model quality is ev… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  43. arXiv:2312.00413  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

    Authors: Weisong Sun, Chunrong Fang, Yun Miao, Yudu You, Mengzhe Yuan, Yuchen Chen, Quanjun Zhang, An Guo, Xiang Chen, Yang Liu, Zhenyu Chen

    Abstract: Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of the source code features while preserving its semantics. These representations can be used for facilitating subsequent code-related tasks. The abstract syntax… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: submitted to ACM Transactions on Software Engineering and Methodology. arXiv admin note: text overlap with arXiv:2103.10668 by other authors

    MSC Class: 68-04; 68T30 ACM Class: D.2.3; I.2.2; I.2.4

  44. arXiv:2311.18765  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MLLMs-Augmented Visual-Language Representation Learning

    Authors: Yanqing Liu, Kai Wang, Wenqi Shao, Ping Luo, Yu Qiao, Mike Zheng Shou, Kaipeng Zhang, Yang You

    Abstract: Visual-language pre-training has achieved remarkable success in many multi-modal tasks, largely attributed to the availability of large-scale image-text datasets. In this work, we demonstrate that Multi-modal Large Language Models (MLLMs) can enhance visual-language representation learning by establishing richer image-text associations for image-text datasets. Our approach is simple, utilizing MLL… ▽ More

    Submitted 13 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  45. arXiv:2311.15529  [pdf, other

    cs.CV

    Efficient Dataset Distillation via Minimax Diffusion

    Authors: Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Haonan Wang, Wei Jiang, Yang You, Yiran Chen

    Abstract: Dataset distillation reduces the storage and computational consumption of training a network by generating a small surrogate dataset that encapsulates rich information of the original large-scale one. However, previous distillation methods heavily rely on the sample-wise iterative optimization scheme. As the images-per-class (IPC) setting or image resolution grows larger, the necessary computation… ▽ More

    Submitted 25 March, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  46. arXiv:2311.13656  [pdf, other

    cs.HC cs.CV

    Panda or not Panda? Understanding Adversarial Attacks with Interactive Visualization

    Authors: Yuzhe You, Jarvis Tse, Jian Zhao

    Abstract: Adversarial machine learning (AML) studies attacks that can fool machine learning algorithms into generating incorrect outcomes as well as the defenses against worst-case attacks to strengthen model robustness. Specifically for image classification, it is challenging to understand adversarial attacks due to their use of subtle perturbations that are not human-interpretable, as well as the variabil… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  47. arXiv:2311.05050  [pdf, other

    cs.LG quant-ph

    Quantum Generative Modeling of Sequential Data with Trainable Token Embedding

    Authors: Wanda Hou, Miao Li, Yi-Zhuang You

    Abstract: Generative models are a class of machine learning models that aim to learn the underlying probability distribution of data. Unlike discriminative models, generative models focus on capturing the data's inherent structure, allowing them to generate new samples that resemble the original data. To fully exploit the potential of modeling probability distributions using quantum physics, a quantum-inspi… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 5 pages, 4 figures

  48. arXiv:2311.02787  [pdf, other

    cs.RO cs.AI

    Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools

    Authors: Yang You, Bokui Shen, Congyue Deng, Haoran Geng, Songlin Wei, He Wang, Leonidas Guibas

    Abstract: Deformable object manipulation stands as one of the most captivating yet formidable challenges in robotics. While previous techniques have predominantly relied on learning latent dynamics through demonstrations, typically represented as either particles or images, there exists a pertinent limitation: acquiring suitable demonstrations, especially for long-horizon tasks, can be elusive. Moreover, ba… ▽ More

    Submitted 24 March, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: 8 pages

  49. arXiv:2310.19080  [pdf, other

    cs.CV

    Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery

    Authors: Katie Z Luo, Zhenzhen Liu, Xiangyu Chen, Yurong You, Sagie Benaim, Cheng Perng Phoo, Mark Campbell, Wen Sun, Bharath Hariharan, Kilian Q. Weinberger

    Abstract: Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper,… ▽ More

    Submitted 5 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

  50. arXiv:2310.16838  [pdf, other

    cs.RO cs.CV

    SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation

    Authors: Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin Zhu, Leonidas Guibas

    Abstract: Humans demonstrate remarkable skill in transferring manipulation abilities across objects of varying shapes, poses, and appearances, a capability rooted in their understanding of semantic correspondences between different instances. To equip robots with a similar high-level comprehension, we present SparseDFF, a novel DFF for 3D scenes utilizing large 2D vision models to extract semantic features… ▽ More

    Submitted 18 March, 2024; v1 submitted 25 October, 2023; originally announced October 2023.