Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 257 results for author: Yin, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.01523  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Machine learning approach for vibronically renormalized electronic band structures

    Authors: Niraj Aryal, Sheng Zhang, Weiguo Yin, Gia-Wei Chern

    Abstract: We present a machine learning (ML) method for efficient computation of vibrational thermal expectation values of physical properties from first principles. Our approach is based on the non-perturbative frozen phonon formulation in which stochastic Monte Carlo algorithm is employed to sample configurations of nuclei in a supercell at finite temperatures based on a first-principles phonon model. A d… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 17 pages, 7 figures

  2. arXiv:2408.13981  [pdf

    cs.CV

    ARANet: Attention-based Residual Adversarial Network with Deep Supervision for Radiotherapy Dose Prediction of Cervical Cancer

    Authors: Lu Wen, Wenxia Yin, Zhenghao Feng, Xi Wu, Deng Xiong, Yan Wang

    Abstract: Radiation therapy is the mainstay treatment for cervical cancer, and its ultimate goal is to ensure the planning target volume (PTV) reaches the prescribed dose while reducing dose deposition of organs-at-risk (OARs) as much as possible. To achieve these clinical requirements, the medical physicist needs to manually tweak the radiotherapy plan repeatedly in a trial-anderror manner until finding th… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted by 2024 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM)

  3. arXiv:2408.08567  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    S$^3$Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching

    Authors: Xue Wang, Tian Zhou, Jianqing Zhu, Jialin Liu, Kun Yuan, Tao Yao, Wotao Yin, Rong Jin, HanQin Cai

    Abstract: Attention based models have achieved many remarkable breakthroughs in numerous applications. However, the quadratic complexity of Attention makes the vanilla Attention based models hard to apply to long sequence tasks. Various improved Attention structures are proposed to reduce the computation cost by inducing low rankness and approximating the whole sequence by sub-sequences. The most challengin… ▽ More

    Submitted 23 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  4. arXiv:2408.08084  [pdf, other

    cs.LG cs.AI

    An Efficient Replay for Class-Incremental Learning with Pre-trained Models

    Authors: Weimin Yin, Bin Chen adn Chunzhao Xie, Zhenhao Tan

    Abstract: In general class-incremental learning, researchers typically use sample sets as a tool to avoid catastrophic forgetting during continuous learning. At the same time, researchers have also noted the differences between class-incremental learning and Oracle training and have attempted to make corrections. In recent years, researchers have begun to develop class-incremental learning algorithms utiliz… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  5. arXiv:2408.00244  [pdf, other

    cs.CL cs.LG

    Enhanced Structured State Space Models via Grouped FIR Filtering and Attention Sink Mechanisms

    Authors: Tian Meng, Yang Tao, Wuliang Yin

    Abstract: Structured State Space Models (SSMs) have emerged as compelling alternatives to Transformer architectures, offering linear-time complexity and superior performance in various sequence modeling tasks. Despite their advantages, SSMs like the original Mamba-2 face training difficulties due to the sensitivities introduced by the extended series of recurrent matrix multiplications. In this paper, we pr… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  6. arXiv:2407.11017  [pdf, other

    cs.CL cs.AI cs.LG

    Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation

    Authors: Jihyun Janice Ahn, Ryo Kamoi, Lu Cheng, Rui Zhang, Wenpeng Yin

    Abstract: Mainstream LLM research has primarily focused on enhancing their generative capabilities. However, even the most advanced LLMs experience uncertainty in their outputs, often producing varied results on different runs or when faced with minor changes in input, despite no substantial change in content. Given multiple responses from the same LLM to the same input, we advocate leveraging the LLMs' dis… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: 4 pages, 3 tables

  7. arXiv:2407.07924  [pdf, other

    math.OC cs.AI cs.CL cs.LG

    Solving General Natural-Language-Description Optimization Problems with Large Language Models

    Authors: Jihai Zhang, Wei Wang, Siyan Guo, Li Wang, Fangquan Lin, Cheng Yang, Wotao Yin

    Abstract: Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this p… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  8. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  9. arXiv:2406.16203  [pdf, other

    cs.CL

    LLMs' Classification Performance is Overclaimed

    Authors: Hanzi Xu, Renze Lou, Jiangshu Du, Vahid Mahzoon, Elmira Talebianaraki, Zhuoan Zhou, Elizabeth Garrison, Slobodan Vucetic, Wenpeng Yin

    Abstract: In many classification tasks designed for AI or human to solve, gold labels are typically included within the label space by default, often posed as "which of the following is correct?" This standard setup has traditionally highlighted the strong performance of advanced AI, particularly top-performing Large Language Models (LLMs), in routine classification tasks. However, when the gold label is in… ▽ More

    Submitted 3 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  10. arXiv:2406.13103  [pdf, other

    cs.AI cs.LG

    A Generic Method for Fine-grained Category Discovery in Natural Language Texts

    Authors: Chang Tian, Matthew B. Blaschko, Wenpeng Yin, Mingzhe Xing, Yinliang Yue, Marie-Francine Moens

    Abstract: Fine-grained category discovery using only coarse-grained supervision is a cost-effective yet challenging task. Previous training methods focus on aligning query samples with positive samples and distancing them from negatives. They often neglect intra-category and inter-category semantic similarities of fine-grained categories when navigating sample distributions in the embedding space. Furthermo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: preprint

  11. arXiv:2406.05938  [pdf, other

    cs.LG math.OC

    Expressive Power of Graph Neural Networks for (Mixed-Integer) Quadratic Programs

    Authors: Ziang Chen, Xiaohan Chen, Jialin Liu, Xinshang Wang, Wotao Yin

    Abstract: Quadratic programming (QP) is the most widely applied category of problems in nonlinear programming. Many applications require real-time/fast solutions, though not necessarily with high precision. Existing methods either involve matrix decomposition or use the preconditioned conjugate gradient method. For relatively large instances, these methods cannot achieve the real-time requirement unless the… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  12. arXiv:2406.05602  [pdf, other

    cs.CV cs.CL

    Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

    Authors: Philip Wootaek Shin, Jihyun Janice Ahn, Wenpeng Yin, Jack Sampson, Vijaykrishnan Narayanan

    Abstract: It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and t… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  13. arXiv:2406.05460  [pdf, other

    cs.CL cs.AI

    Fighting Against the Repetitive Training and Sample Dependency Problem in Few-shot Named Entity Recognition

    Authors: Chang Tian, Wenpeng Yin, Dan Li, Marie-Francine Moens

    Abstract: Few-shot named entity recognition (NER) systems recognize entities using a few labeled training examples. The general pipeline consists of a span detector to identify entity spans in text and an entity-type classifier to assign types to entities. Current span detectors rely on extensive manual labeling to guide training. Almost every span detector requires initial training on basic span features f… ▽ More

    Submitted 18 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: ieee access: https://doi.org/10.1109/ACCESS.2024.3374727

  14. arXiv:2406.02006  [pdf, other

    math.OC cs.AI

    ODE-based Learning to Optimize

    Authors: Zhonglin Xie, Wotao Yin, Zaiwen Wen

    Abstract: Recent years have seen a growing interest in understanding acceleration methods through the lens of ordinary differential equations (ODEs). Despite the theoretical advancements, translating the rapid convergence observed in continuous-time models to discrete-time iterative methods poses significant challenges. In this paper, we present a comprehensive framework integrating the inertial systems wit… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 55 pages, 28 figures

  15. arXiv:2405.19978  [pdf, other

    cs.LG stat.ML

    Domain Adaptation with Cauchy-Schwarz Divergence

    Authors: Wenzhe Yin, Shujian Yu, Yicong Lin, Jie Liu, Jan-Jakob Sonke, Efstratios Gavves

    Abstract: Domain adaptation aims to use training data from one or multiple source domains to learn a hypothesis that can be generalized to a different, but related, target domain. As such, having a reliable measure for evaluating the discrepancy of both marginal and conditional distributions is crucial. We introduce Cauchy-Schwarz (CS) divergence to the problem of unsupervised domain adaptation (UDA). The C… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by UAI-24

  16. arXiv:2405.17705  [pdf, other

    cs.CV

    DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos

    Authors: Linhan Wang, Kai Cheng, Shuo Lei, Shengkun Wang, Wei Yin, Chenyang Lei, Xiaoxiao Long, Chang-Tien Lu

    Abstract: We present DC-Gaussian, a new method for generating novel views from in-vehicle dash cam videos. While neural rendering techniques have made significant strides in driving scenarios, existing methods are primarily designed for videos collected by autonomous vehicles. However, these videos are limited in both quantity and diversity compared to dash cam videos, which are more widely used across vari… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 9 pages,7 figures;project page: https://linhanwang.github.io/dcgaussian/

  17. arXiv:2405.15251  [pdf, other

    math.OC cs.LG stat.ML

    Learning to optimize: A tutorial for continuous and mixed-integer optimization

    Authors: Xiaohan Chen, Jialin Liu, Wotao Yin

    Abstract: Learning to Optimize (L2O) stands at the intersection of traditional optimization and machine learning, utilizing the capabilities of machine learning to enhance conventional optimization techniques. As real-world optimization problems frequently share common structures, L2O provides a tool to exploit these structures for better or faster solutions. This tutorial dives deep into L2O techniques, in… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  18. arXiv:2405.14741  [pdf, other

    math.OC cs.LG stat.ML

    Bagging Improves Generalization Exponentially

    Authors: Huajie Qian, Donghao Ying, Henry Lam, Wotao Yin

    Abstract: Bagging is a popular ensemble technique to improve the accuracy of machine learning models. It hinges on the well-established rationale that, by repeatedly retraining on resampled data, the aggregated model exhibits lower variance and hence higher stability, especially for discontinuous base learners. In this paper, we provide a new perspective on bagging: By suitably aggregating the base learners… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Correct author list typo

  19. arXiv:2405.10205  [pdf, other

    cs.HC

    Exploring the Impact of ChatGPT on Wikipedia Engagement

    Authors: Neal Reeves, Wenjie Yin, Elena Simperl

    Abstract: Wikipedia is one of the most popular websites in the world, serving as a major source of information and learning resource for millions of users worldwide. While motivations for its usage vary, prior research suggests shallow information gathering -- looking up facts and information or answering questions -- dominates over more in-depth usage. On the 22nd of November 2022, ChatGPT was released to… ▽ More

    Submitted 29 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures, submitted to ACM Collective Intelligence

  20. arXiv:2404.19417  [pdf, other

    cs.CV

    Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World

    Authors: Wen Yin, Jian Lou, Pan Zhou, Yulai Xie, Dan Feng, Yuhua Sun, Tailai Zhang, Lichao Sun

    Abstract: Backdoor attacks have been well-studied in visible light object detection (VLOD) in recent years. However, VLOD can not effectively work in dark and temperature-sensitive scenarios. Instead, thermal infrared object detection (TIOD) is the most accessible and practical in such environments. In this paper, our team is the first to investigate the security vulnerabilities associated with TIOD in the… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: To appear in CVPR 2024.11pages, 8 figures and 4 tables

  21. arXiv:2404.15506  [pdf, other

    cs.CV

    Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

    Authors: Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

    Abstract: We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. SoTA monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recov… ▽ More

    Submitted 16 August, 2024; v1 submitted 21 March, 2024; originally announced April 2024.

    Comments: Our project page is at https://JUGGHM.github.io/Metric3Dv2. Accpeted to TPAMI. arXiv admin note: text overlap with arXiv:2307.10984

  22. arXiv:2404.03602  [pdf, other

    cs.CL

    Evaluating LLMs at Detecting Errors in LLM Responses

    Authors: Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang

    Abstract: With Large Language Models (LLMs) being widely used across various tasks, detecting errors in their responses is increasingly crucial. However, little research has been conducted on error detection of LLM responses. Collecting error annotations on LLM responses is challenging due to the subjective nature of many NLP tasks, and thus previous research focuses on tasks of little practical value (e.g.… ▽ More

    Submitted 27 July, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: COLM 2024, 46 pages, Benchmark and code: https://github.com/psunlpgroup/ReaLMistake

  23. arXiv:2403.17934  [pdf, other

    cs.CV

    AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

    Authors: Qingping Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi Sing Leung, Ziwei Liu, Lei Yang, Zhongang Cai

    Abstract: Expressive human pose and shape estimation (a.k.a. 3D whole-body mesh recovery) involves the human body, hand, and expression estimation. Most existing methods have tackled this task in a two-stage manner, first detecting the human body part with an off-the-shelf detection model and inferring the different human body parts individually. Despite the impressive results achieved, these methods suffer… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Homepage: https://ttxskk.github.io/AiOS/

  24. arXiv:2403.13307  [pdf, other

    cs.CV

    LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment

    Authors: Peishan Cong, Ziyi Wang, Zhiyang Dou, Yiming Ren, Wei Yin, Kai Cheng, Yujing Sun, Xiaoxiao Long, Xinge Zhu, Yuexin Ma

    Abstract: Language-guided scene-aware human motion generation has great significance for entertainment and robotics. In response to the limitations of existing datasets, we introduce LaserHuman, a pioneering dataset engineered to revolutionize Scene-Text-to-Motion research. LaserHuman stands out with its inclusion of genuine human motions within 3D environments, unbounded free-form natural language descript… ▽ More

    Submitted 21 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  25. arXiv:2403.12959  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    WHAC: World-grounded Humans and Cameras

    Authors: Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang

    Abstract: Estimating human and camera trajectories with accurate scale in the world coordinate system from a monocular video is a highly desirable yet challenging and ill-posed problem. In this study, we aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera. Our a… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Homepage: https://wqyin.github.io/projects/WHAC/

  26. arXiv:2403.12013  [pdf, other

    cs.CV

    GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

    Authors: Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long

    Abstract: We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e.g., depth and normals, from single images. While significant research has already been conducted in this area, the progress has been substantially limited by the low diversity and poor quality of publicly available datasets. As a result, the prior works either are constrained to limited scenar… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: https://fuxiao0719.github.io/projects/geowizard/

  27. arXiv:2403.11805  [pdf, other

    cs.OS

    LLM as a System Service on Mobile Devices

    Authors: Wangsong Yin, Mengwei Xu, Yuanchun Li, Xuanzhe Liu

    Abstract: Being more powerful and intrusive into user-device interactions, LLMs are eager for on-device execution to better preserve user privacy. In this work, we propose a new paradigm of mobile AI: LLM as a system service on mobile devices (LLMaaS). Unlike traditional DNNs that execute in a stateless manner, such a system service is stateful: LLMs execution often needs to maintain persistent states (main… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Technical Report

  28. arXiv:2403.10287  [pdf, other

    cs.CV

    Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models

    Authors: Tian Meng, Yang Tao, Ruilin Lyu, Wuliang Yin

    Abstract: The task of few-shot image classification and segmentation (FS-CS) involves classifying and segmenting target objects in a query image, given only a few examples of the target classes. We introduce the Vision-Instructed Segmentation and Evaluation (VISE) method that transforms the FS-CS problem into the Visual Question Answering (VQA) problem, utilising Vision-Language Models (VLMs), and addresses… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  29. arXiv:2403.09407  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    LM2D: Lyrics- and Music-Driven Dance Synthesis

    Authors: Wenjie Yin, Xuejiao Zhao, Yi Yu, Hang Yin, Danica Kragic, Mårten Björkman

    Abstract: Dance typically involves professional choreography with complex movements that follow a musical rhythm and can also be influenced by lyrical content. The integration of lyrics in addition to the auditory dimension, enriches the foundational tone and makes motion generation more amenable to its semantic meanings. However, existing dance synthesis methods tend to model motions only conditioned on au… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  30. arXiv:2403.07535  [pdf, other

    cs.CV

    Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

    Authors: JunDa Cheng, Wei Yin, Kaixuan Wang, Xiaozhi Chen, Shijie Wang, Xin Yang

    Abstract: Multi-view depth estimation has achieved impressive performance over various benchmarks. However, almost all current multi-view systems rely on given ideal camera poses, which are unavailable in many real-world scenarios, such as autonomous driving. In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings. Surprisingly, we find c… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  31. arXiv:2403.03863  [pdf, other

    cs.CL

    X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in Classification

    Authors: Hanzi Xu, Muhao Chen, Lifu Huang, Slobodan Vucetic, Wenpeng Yin

    Abstract: In recent years, few-shot and zero-shot learning, which learn to predict labels with limited annotated instances, have garnered significant attention. Traditional approaches often treat frequent-shot (freq-shot; labels with abundant instances), few-shot, and zero-shot learning as distinct challenges, optimizing systems for just one of these scenarios. Yet, in real-world settings, label occurrences… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  32. arXiv:2402.18667  [pdf, other

    cs.CL

    FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

    Authors: Congying Xia, Chen Xing, Jiangshu Du, Xinyi Yang, Yihao Feng, Ran Xu, Wenpeng Yin, Caiming Xiong

    Abstract: This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats, a crucial yet underexamined capability for their application as AI agents. Despite LLMs' advancements, existing benchmarks fail to assess their format-following proficiency adequately. FoFo fills this gap with a diverse range of real-world formats and in… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: The first two authors contributed equally

  33. arXiv:2402.15896  [pdf, other

    cs.CV

    Multimodal Instruction Tuning with Conditional Mixture of LoRA

    Authors: Ying Shen, Zhiyang Xu, Qifan Wang, Yu Cheng, Wenpeng Yin, Lifu Huang

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks. Multimodal instruction tuning has emerged as a successful strategy for achieving zero-shot generalization by fine-tuning pre-trained models on diverse multimodal ta… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 8 pages, multimodal instruction tuning

  34. arXiv:2402.14650  [pdf, other

    cs.CV

    GaussianPro: 3D Gaussian Splatting with Progressive Propagation

    Authors: Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chen

    Abstract: The advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling with large-scale scenes that unavoidably contain texture-less surfaces, the SfM techniques always f… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: See the project page for code, data: https://kcheng1021.github.io/gaussianpro.github.io

  35. arXiv:2402.11791  [pdf, other

    cs.CV

    SDGE: Stereo Guided Depth Estimation for 360$^\circ$ Camera Sets

    Authors: Jialei Xu, Wei Yin, Dong Gong, Junjun Jiang, Xianming Liu

    Abstract: Depth estimation is a critical technology in autonomous driving, and multi-camera systems are often used to achieve a 360$^\circ$ perception. These 360$^\circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image. Alternatively, monocular methods may not produce consistent cross-view predictions. To address these issues, we… ▽ More

    Submitted 2 April, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  36. arXiv:2402.11592  [pdf, other

    cs.LG cs.CL

    Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

    Authors: Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen

    Abstract: In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications… ▽ More

    Submitted 27 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  37. arXiv:2402.11138  [pdf, other

    cs.CL cs.AI cs.LG

    Contrastive Instruction Tuning

    Authors: Tianyi Lorena Yan, Fei Wang, James Y. Huang, Wenxuan Zhou, Fan Yin, Aram Galstyan, Wenpeng Yin, Muhao Chen

    Abstract: Instruction tuning has been used as a promising approach to improve the performance of large language models (LLMs) on unseen tasks. However, current LLMs exhibit limited robustness to unseen instructions, generating inconsistent outputs when the same instruction is phrased with slightly varied forms or language styles. This behavior indicates LLMs' lack of robustness to textual variations and gen… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Findings

  38. arXiv:2402.11122  [pdf, other

    cs.CL cs.AI

    Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models

    Authors: Zihao Lin, Mohammad Beigi, Hongxuan Li, Yufan Zhou, Yuxiang Zhang, Qifan Wang, Wenpeng Yin, Lifu Huang

    Abstract: Memory Editing (ME) has emerged as an efficient method to modify erroneous facts or inject new facts into Large Language Models (LLMs). Two mainstream ME methods exist: parameter-modifying ME and parameter-preserving ME (integrating extra modules while preserving original parameters). Regrettably, previous studies on ME evaluation have two critical limitations: (i) evaluating LLMs with single edit… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: preprint, 15 pages

  39. arXiv:2402.11095  [pdf, other

    cs.CV

    GIM: Learning Generalizable Image Matcher From Internet Videos

    Authors: Xuelun Shen, Zhipeng Cai, Wei Yin, Matthias Müller, Zijun Li, Kaixuan Wang, Xiaozhi Chen, Cheng Wang

    Abstract: Image matching is a fundamental computer vision problem. While learning-based methods achieve state-of-the-art performance on existing benchmarks, they generalize poorly to in-the-wild images. Such methods typically need to train separate models for different scene types and are impractical when the scene type is unknown in advance. One of the underlying problems is the limited scalability of exis… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted to ICLR 2024 for spotlight presentation

  40. arXiv:2402.10874  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Design of 2D Skyrmionic Metamaterial Through Controlled Assembly

    Authors: Qichen Xu, Zhuanglin Shen, Alexander Edström, I. P. Miranda, Zhiwei Lu, Anders Bergman, Danny Thonig, Wanjian Yin, Olle Eriksson, Anna Delin

    Abstract: Despite extensive research on magnetic skyrmions and antiskyrmions, a significant challenge remains in crafting nontrivial high-order skyrmionic textures with varying, or even tailor-made, topologies. We address this challenge, by focusing on a construction pathway of skyrmionics metamaterial within a monolayer thin film and suggest several promising lattice-like, flakes-like, and cell-like skyrmi… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  41. arXiv:2402.07099  [pdf, other

    cs.LG math.OC

    Rethinking the Capacity of Graph Neural Networks for Branching Strategy

    Authors: Ziang Chen, Jialin Liu, Xiaohan Chen, Xinshang Wang, Wotao Yin

    Abstract: Graph neural networks (GNNs) have been widely used to predict properties and heuristics of mixed-integer linear programs (MILPs) and hence accelerate MILP solvers. This paper investigates the capacity of GNNs to represent strong branching (SB), the most effective yet computationally expensive heuristic employed in the branch-and-bound algorithm. In the literature, message-passing GNN (MP-GNN), as… ▽ More

    Submitted 8 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

  42. arXiv:2402.00157  [pdf, other

    cs.CL

    Large Language Models for Mathematical Reasoning: Progresses and Challenges

    Authors: Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, Wenpeng Yin

    Abstract: Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence. In recent times, there has been a notable surge in the development of Large Language Models (LLMs) geared towards the automated resolution of mathematical problems. However, the landscape of mathematical problem types is vast and varied, with LLM-oriented techniques undergoing… ▽ More

    Submitted 5 April, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: EACL 2024 Student Research Workshop, 8 pages

  43. arXiv:2401.17099  [pdf, other

    cs.CL

    MT-Ranker: Reference-free machine translation evaluation by inter-system ranking

    Authors: Ibraheem Muhammad Moosa, Rui Zhang, Wenpeng Yin

    Abstract: Traditionally, Machine Translation (MT) Evaluation has been treated as a regression problem -- producing an absolute translation-quality score. This approach has two limitations: i) the scores lack interpretability, and human annotators struggle with giving consistent scores; ii) most scoring methods are based on (reference, translation) pairs, limiting their applicability in real-world scenarios… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 18 pages, 4 figures, to be published in ICLR'24, Code available at https://github.com/ibraheem-moosa/mt-ranker

    ACM Class: I.2.7

  44. arXiv:2401.16051  [pdf, other

    cs.CV

    Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud Segmentation

    Authors: Jie Liu, Wenzhe Yin, Haochen Wang, Yunlu CHen, Jan-Jakob Sonke, Efstratios Gavves

    Abstract: Few-shot point cloud segmentation seeks to generate per-point masks for previously unseen categories, using only a minimal set of annotated point clouds as reference. Existing prototype-based methods rely on support prototypes to guide the segmentation of query point clouds, but they encounter challenges when significant object variations exist between the support prototypes and query features. In… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted in 3DV2024, code is available at https://github.com/jliu4ai/DPA

  45. arXiv:2401.08092  [pdf, other

    cs.LG cs.AI cs.DC

    A Survey of Resource-efficient LLM and Multimodal Foundation Models

    Authors: Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu

    Abstract: Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. To support the growth of the… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  46. arXiv:2312.13533  [pdf, other

    cs.CL

    Automated Clinical Coding for Outpatient Departments

    Authors: Viktor Schlegel, Abhinav Ramesh Kashyap, Thanh-Tung Nguyen, Tsung-Han Yang, Vijay Prakash Dwivedi, Wei-Hsian Yin, Jeng Wei, Stefan Winkler

    Abstract: Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they pr… ▽ More

    Submitted 24 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 9 pages, preprint under review

  47. arXiv:2312.13527  [pdf, other

    cs.MS math.OC

    MindOpt Adapter for CPLEX Benchmarking Performance Analysis

    Authors: Mou Sun, Tao Li, Wotao Yin

    Abstract: This report provides a comprehensive analysis of the performance of MindOpt Adapter for CPLEX 12.9 in benchmark testing. CPLEX, recognized as a robust Mixed Integer Programming (MIP) solver, has faced some scrutiny regarding its performance on MIPLIB 2017 when configured to default settings. MindOpt Adapter aims to enhance CPLEX's performance by automatically applying improved configurations for s… ▽ More

    Submitted 31 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  48. arXiv:2312.09069  [pdf, other

    cs.CV

    PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion

    Authors: Ying-Tian Liu, Yuan-Chen Guo, Guan Luo, Heyi Sun, Wei Yin, Song-Hai Zhang

    Abstract: Diffusion models trained on large-scale text-image datasets have demonstrated a strong capability of controllable high-quality image generation from arbitrary text prompts. However, the generation quality and generalization ability of 3D diffusion models is hindered by the scarcity of high-quality and large-scale 3D datasets. In this paper, we present PI3D, a framework that fully leverages the pre… ▽ More

    Submitted 21 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  49. arXiv:2312.08895  [pdf, other

    cs.CV

    Motion Flow Matching for Human Motion Synthesis and Editing

    Authors: Vincent Tao Hu, Wenzhe Yin, Pingchuan Ma, Yunlu Chen, Basura Fernando, Yuki M Asano, Efstratios Gavves, Pascal Mettes, Bjorn Ommer, Cees G. M. Snoek

    Abstract: Human motion synthesis is a fundamental task in computer animation. Recent methods based on diffusion models or GPT structure demonstrate commendable performance but exhibit drawbacks in terms of slow sampling speeds and error accumulation. In this paper, we propose \emph{Motion Flow Matching}, a novel generative model designed for human motion generation featuring efficient sampling and effective… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: WIP

  50. arXiv:2312.07311  [pdf, other

    cs.CV cs.AI cs.LG

    Scalable Motion Style Transfer with Constrained Diffusion Generation

    Authors: Wenjie Yin, Yi Yu, Hang Yin, Danica Kragic, Mårten Björkman

    Abstract: Current training of motion style transfer systems relies on consistency losses across style domains to preserve contents, hindering its scalable application to a large number of domains and private data. Recent image transfer works show the potential of independent training on each domain by leveraging implicit bridging between diffusion models, with the content preservation, however, limited to s… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.