Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 431 results for author: Zheng, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10631  [pdf, other

    cs.GT cs.LG math.OC

    Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

    Authors: Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

    Abstract: Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several adva… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 27 pages, 4 figures

  2. arXiv:2406.08877  [pdf, other

    cs.CV

    EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

    Authors: Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, Wei-Shi Zheng

    Abstract: We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal bound… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 33 pages, 9 figures

  3. arXiv:2406.08767  [pdf, ps, other

    cs.IT

    Coding for the unsourced B-channel with erasures: enhancing the linked loop code

    Authors: William W. Zheng, Jamison R. Ebert, Stefano Rini, Jean-Francois Chamberland

    Abstract: In [1], the linked loop code (LLC) is presented as a promising code for the unsourced A-channel with erasures (UACE). The UACE is an unsourced multiple access channel in which active users' transmitted symbols are erased with a given probability and the channel output is obtained as the union of the non-erased symbols. In this paper, we extend the UACE channel model to the unsourced B-channel with… ▽ More

    Submitted 20 May, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, accepted by ICASSP 2024

  4. arXiv:2406.07296  [pdf, other

    cs.RO cs.CL

    Instruct Large Language Models to Drive like Humans

    Authors: Ruijun Zhang, Xianda Guo, Wenzhao Zheng, Chenming Zhang, Kurt Keutzer, Long Chen

    Abstract: Motion planning in complex scenarios is the core challenge in autonomous driving. Conventional methods apply predefined rules or learn from driving data to plan the future trajectory. Recent methods seek the knowledge preserved in large language models (LLMs) and apply them in the driving scenarios. Despite the promising results, it is still unclear whether the LLM learns the underlying human logi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: project page: https://github.com/bonbon-rj/InstructDriver

  5. arXiv:2406.06384  [pdf, other

    cs.CV

    Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations

    Authors: Peng Xia, Ming Hu, Feilong Tang, Wenxue Li, Wenhao Zheng, Lie Ju, Peibo Duan, Huaxiu Yao, Zongyuan Ge

    Abstract: Diabetic Retinopathy (DR), induced by diabetes, poses a significant risk of visual impairment. Accurate and effective grading of DR aids in the treatment of this condition. Yet existing models experience notable performance degradation on unseen domains due to domain shifts. Previous methods address this issue by simulating domain style through simple visual transformation and mitigating domain no… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Early Accepted by MICCAI 2024

  6. arXiv:2406.06007  [pdf, other

    cs.LG cs.CL cs.CV cs.CY

    CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

    Authors: Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao

    Abstract: Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehen… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  7. arXiv:2406.01066  [pdf, other

    cs.LG

    Topology-Aware Dynamic Reweighting for Distribution Shifts on Graph

    Authors: Weihuang Zheng, Jiashuo Liu, Jiaxing Li, Jiayun Wu, Peng Cui, Youyong Kong

    Abstract: Graph Neural Networks (GNNs) are widely used for node classification tasks but often fail to generalize when training and test nodes come from different distributions, limiting their practicality. To overcome this, recent approaches adopt invariant learning techniques from the out-of-distribution (OOD) generalization field, which seek to establish stable prediction methods across environments. How… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  8. arXiv:2405.20337  [pdf, other

    cs.CV cs.AI

    OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

    Authors: Lening Wang, Wenzhao Zheng, Yilong Ren, Han Jiang, Zhiyong Cui, Haiyang Yu, Jiwen Lu

    Abstract: Understanding the evolution of 3D scenes is important for effective autonomous driving. While conventional methods mode scene development with the motion of individual instances, world models emerge as a generative framework to describe the general scene dynamics. However, most existing methods adopt an autoregressive framework to perform next-token prediction, which suffer from inefficiency in mo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Code is available at: https://github.com/wzzheng/OccSora

  9. arXiv:2405.20323  [pdf, other

    cs.CV cs.AI

    $\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

    Authors: Nan Huang, Xiaobao Wei, Wenzhao Zheng, Pengju An, Ming Lu, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang

    Abstract: Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving. Despite the efficacy of Neural Radiance Fields (NeRF) for driving scenes, 3D Gaussian Splatting (3DGS) emerges as a promising direction due to its faster speed and more explicit representation. However, most existing street 3DGS methods require tracked 3D vehicle b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Code is available at: https://github.com/nnanhuang/S3Gaussian/

  10. arXiv:2405.19291  [pdf, other

    cs.RO

    Grasp as You Say: Language-guided Dexterous Grasp Generation

    Authors: Yi-Lin Wei, Jian-Jian Jiang, Chengyi Xing, Xiantuo Tan, Xiao-Ming Wu, Hao Li, Mark Cutkosky, Wei-Shi Zheng

    Abstract: This paper explores a novel task ""Dexterous Grasp as You Say"" (DexGYS), enabling robots to perform dexterous grasping based on human commands expressed in natural language. However, the development of this field is hindered by the lack of datasets with natural human guidance; thus, we propose a language-guided dexterous grasp dataset, named DexGYSNet, offering high-quality dexterous grasp annota… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 9 pages, 7 figures

  11. arXiv:2405.17872  [pdf, other

    cs.CV

    HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction

    Authors: Haoyu Zhao, Xingyue Zhao, Lingting Zhu, Weixi Zheng, Yongchao Xu

    Abstract: Robot-assisted minimally invasive surgery benefits from enhancing dynamic scene reconstruction, as it improves surgical outcomes. While Neural Radiance Fields (NeRF) have been effective in scene reconstruction, their slow inference speeds and lengthy training durations limit their applicability. To overcome these limitations, 3D Gaussian Splatting (3D-GS) based methods have emerged as a recent tre… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 13 pages, 4 figures

  12. arXiv:2405.17503  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    Code Repair with LLMs gives an Exploration-Exploitation Tradeoff

    Authors: Hao Tang, Keya Hu, Jin Peng Zhou, Sicheng Zhong, Wei-Long Zheng, Xujie Si, Kevin Ellis

    Abstract: Iteratively improving and repairing source code with large language models (LLMs), known as refinement, has emerged as a popular way of generating programs that would be too complex to construct in one shot. Given a bank of test cases, together with a candidate program, an LLM can improve that program by being prompted with failed test cases. But it remains an open question how to best iteratively… ▽ More

    Submitted 30 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

  13. arXiv:2405.17429  [pdf, other

    cs.CV cs.AI

    GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

    Authors: Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang, Jie Zhou, Jiwen Lu

    Abstract: 3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene and is an important task for the robustness of vision-centric autonomous driving. Most existing methods employ dense grids such as voxels as scene representations, which ignore the sparsity of occupancy and the diversity of object scales and thus lead to unbalanced allocation of resource… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Code is available at: https://github.com/huang-yh/GaussianFormer

  14. arXiv:2405.17422  [pdf, other

    cs.CV cs.AI cs.LG

    Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection

    Authors: Shuai Zeng, Wenzhao Zheng, Jiwen Lu, Haibin Yan

    Abstract: 3D object detection aims to recover the 3D information of concerning objects and serves as the fundamental task of autonomous driving perception. Its performance greatly depends on the scale of labeled training data, yet it is costly to obtain high-quality annotations for point cloud data. While conventional methods focus on generating pseudo-labels for unlabeled samples as supplements for trainin… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Code is available at: https://github.com/wzzheng/HASS

  15. arXiv:2405.14580  [pdf, other

    cs.GR

    LDM: Large Tensorial SDF Model for Textured Mesh Generation

    Authors: Rengan Xie, Wenting Zheng, Kai Huang, Yizheng Chen, Qi Wang, Qi Ye, Wei Chen, Yuchi Huo

    Abstract: Previous efforts have managed to generate production-ready 3D assets from text or images. However, these methods primarily employ NeRF or 3D Gaussian representations, which are not adept at producing smooth, high-quality geometries required by modern rendering pipelines. In this paper, we propose LDM, a novel feed-forward framework capable of generating high-fidelity, illumination-decoupled textur… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  16. arXiv:2405.11270  [pdf, other

    cs.CV

    HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos

    Authors: Qifeng Chen, Rengan Xie, Kai Huang, Qi Wang, Wenting Zheng, Rong Li, Yuchi Huo

    Abstract: Recently, implicit neural representation has been widely used to generate animatable human avatars. However, the materials and geometry of those representations are coupled in the neural network and hard to edit, which hinders their application in traditional graphics engines. We present a framework for acquiring human avatars that are attached with high-resolution physically-based material textur… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  17. arXiv:2405.07029  [pdf

    cs.SD eess.AS

    A framework of text-dependent speaker verification for chinese numerical string corpus

    Authors: Litong Zheng, Feng Hong, Weijie Xu, Wan Zheng

    Abstract: The Chinese numerical string corpus, serves as a valuable resource for speaker verification, particularly in financial transactions. Researches indicate that in short speech scenarios, text-dependent speaker verification (TD-SV) consistently outperforms text-independent speaker verification (TI-SV). However, TD-SV potentially includes the validation of text information, that can be negatively impa… ▽ More

    Submitted 21 May, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.01645

  18. arXiv:2405.04312  [pdf, other

    cs.CV

    Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

    Authors: Zhuoyi Yang, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng, Yuxiao Dong, Ming Ding, Jie Tang

    Abstract: Diffusion models have shown remarkable performance in image generation in recent years. However, due to a quadratic increase in memory during generating ultra-high-resolution images (e.g. 4096*4096), the resolution of generated images is often limited to 1024*1024. In this work. we propose a unidirectional block attention mechanism that can adaptively adjust the memory overhead during the inferenc… ▽ More

    Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  19. arXiv:2405.00451  [pdf, other

    cs.AI cs.LG

    Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

    Authors: Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P. Lillicrap, Kenji Kawaguchi, Michael Shieh

    Abstract: We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level… ▽ More

    Submitted 17 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, 4 tables (24 pages, 9 figures, 9 tables including references and appendices)

  20. arXiv:2404.19639  [pdf, other

    cs.CV

    ESP-Zero: Unsupervised enhancement of zero-shot classification for Extremely Sparse Point cloud

    Authors: Jiayi Han, Zidi Cao, Weibo Zheng, Xiangguo Zhou, Xiangjian He, Yuanfang Zhang, Daisen Wei

    Abstract: In recent years, zero-shot learning has attracted the focus of many researchers, due to its flexibility and generality. Many approaches have been proposed to achieve the zero-shot classification of the point clouds for 3D object understanding, following the schema of CLIP. However, in the real world, the point clouds could be extremely sparse, dramatically limiting the effectiveness of the 3D poin… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  21. arXiv:2404.19038  [pdf, other

    cs.CV cs.AI

    Embedded Representation Learning Network for Animating Styled Video Portrait

    Authors: Tianyong Wang, Xiangyu Liang, Wangguandong Zheng, Dan Niu, Haifeng Xia, Siyu Xia

    Abstract: The talking head generation recently attracted considerable attention due to its widespread application prospects, especially for digital avatars and 3D animation design. Inspired by this practical demand, several works explored Neural Radiance Fields (NeRF) to synthesize the talking heads. However, these methods based on NeRF face two challenges: (1) Difficulty in generating style-controllable ta… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  22. arXiv:2404.18135  [pdf, other

    cs.RO

    Dexterous Grasp Transformer

    Authors: Guo-Hao Xu, Yi-Lin Wei, Dian Zheng, Xiao-Ming Wu, Wei-Shi Zheng

    Abstract: In this work, we propose a novel discriminative framework for dexterous grasp generation, named Dexterous Grasp TRansformer (DGTR), capable of predicting a diverse set of feasible grasp poses by processing the object point cloud with only one forward pass. We formulate dexterous grasp generation as a set prediction task and design a transformer-based grasping model for it. However, we identify tha… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  23. arXiv:2404.15815  [pdf, other

    cs.CV

    Single-View Scene Point Cloud Human Grasp Generation

    Authors: Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei, Xiao-Ming Wu, Wei-Shi Zheng

    Abstract: In this work, we explore a novel task of generating human grasps based on single-view scene point clouds, which more accurately mirrors the typical real-world situation of observing objects from a single viewpoint. Due to the incompleteness of object point clouds and the presence of numerous scene points, the generated hand is prone to penetrating into the invisible parts of the object and the mod… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  24. arXiv:2404.14934  [pdf, other

    cs.MM cs.CV cs.HC

    G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition

    Authors: Kaikai Deng, Dong Zhao, Wenxin Zheng, Yue Ling, Kangwen Yin, Huadong Ma

    Abstract: Millimeter wave radar is gaining traction recently as a promising modality for enabling pervasive and privacy-preserving gesture recognition. However, the lack of rich and fine-grained radar datasets hinders progress in developing generalized deep learning models for gesture recognition across various user postures (e.g., standing, sitting), positions, and scenes. To remedy this, we resort to desi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 18 pages, 29 figures

  25. 3D object quality prediction for Metal Jet Printer with Multimodal thermal encoder

    Authors: Rachel, Chen, Wenjia Zheng, Sandeep Jalui, Pavan Suri, Jun Zeng

    Abstract: With the advancements in 3D printing technologies, it is extremely important that the quality of 3D printed objects, and dimensional accuracies should meet the customer's specifications. Various factors during metal printing affect the printed parts' quality, including the power quality, the printing stage parameters, the print part's location inside the print bed, the curing stage parameters, and… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  26. arXiv:2404.06119  [pdf, other

    cs.CV

    DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation

    Authors: Junkai Yan, Yipeng Gao, Qize Yang, Xihan Wei, Xuansong Xie, Ancong Wu, Wei-Shi Zheng

    Abstract: Text-to-3D generation, which synthesizes 3D assets according to an overall text description, has significantly progressed. However, a challenge arises when the specific appearances need customizing at designated viewpoints but referring solely to the overall description for generating 3D objects. For instance, ambiguity easily occurs when producing a T-shirt with distinct patterns on its front and… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  27. arXiv:2404.02345  [pdf, other

    cs.CV

    GaitSTR: Gait Recognition with Sequential Two-stream Refinement

    Authors: Wanrong Zheng, Haidong Zhu, Zhaoheng Zheng, Ram Nevatia

    Abstract: Gait recognition aims to identify a person based on their walking sequences, serving as a useful biometric modality as it can be observed from long distances without requiring cooperation from the subject. In representing a person's walking sequence, silhouettes and skeletons are the two primary modalities used. Silhouette sequences lack detailed part information when overlapping occurs between di… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  28. arXiv:2404.01843  [pdf, other

    cs.CV

    Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation

    Authors: Wangguandong Zheng, Haifeng Xia, Rui Chen, Ming Shao, Siyu Xia, Zhengming Ding

    Abstract: Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To ove… ▽ More

    Submitted 7 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  29. arXiv:2403.19225  [pdf, other

    cs.CV

    Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

    Authors: Angchi Xu, Wei-Shi Zheng

    Abstract: Weakly-supervised action segmentation is a task of learning to partition a long video into several action segments, where training videos are only accompanied by transcripts (ordered list of actions). Most of existing methods need to infer pseudo segmentation for training by serial alignment between all frames and the transcript, which is time-consuming and hard to be parallelized while training.… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  30. arXiv:2403.14430  [pdf, other

    cs.CV

    Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

    Authors: Tianming Liang, Chaolei Tan, Beihao Xia, Wei-Shi Zheng, Jian-Fang Hu

    Abstract: This paper focuses on open-ended video question answering, which aims to find the correct answers from a large answer set in response to a video-related question. This is essentially a multi-label classification task, since a question may have multiple answers. However, due to annotation costs, the labels in existing benchmarks are always extremely insufficient, typically one answer per question.… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  31. arXiv:2403.12303  [pdf, other

    cs.CG

    Semialgebraic Range Stabbing, Ray Shooting, and Intersection Counting in the Plane

    Authors: Timothy M. Chan, Pingan Cheng, Da Wei Zheng

    Abstract: Polynomial partitioning techniques have recently led to improved geometric data structures for a variety of fundamental problems related to semialgebraic range searching and intersection searching in 3D and higher dimensions (e.g., see [Agarwal, Aronov, Ezra, and Zahl, SoCG 2019; Ezra and Sharir, SoCG 2021; Agarwal, Aronov, Ezra, Katz, and Sharir, SoCG 2022]). They have also led to improved algori… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: SOCG 2024

  32. arXiv:2403.11463  [pdf, other

    cs.CV

    Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding

    Authors: Chaolei Tan, Jianhuang Lai, Wei-Shi Zheng, Jian-Fang Hu

    Abstract: Video Paragraph Grounding (VPG) is an emerging task in video-language understanding, which aims at localizing multiple sentences with semantic relations and temporal order from an untrimmed video. However, existing VPG approaches are heavily reliant on a considerable number of temporal labels that are laborious and time-consuming to acquire. In this work, we introduce and explore Weakly-Supervised… ▽ More

    Submitted 14 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. v2: fix a typo in figure 1

  33. arXiv:2403.11157  [pdf, other

    cs.CV

    Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

    Authors: Dian Zheng, Xiao-Ming Wu, Shuzhou Yang, Jian Zhang, Jian-Fang Hu, Wei-Shi Zheng

    Abstract: Universal image restoration is a practical and potential computer vision task for real-world applications. The main challenge of this task is handling the different degradation distributions at once. Existing methods mainly utilize task-specific conditions (e.g., prompt) to guide the model to learn different distributions separately, named multi-partite mapping. However, it is not suitable for uni… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024

  34. arXiv:2403.11121  [pdf, other

    cs.CV

    A Versatile Framework for Multi-scene Person Re-identification

    Authors: Wei-Shi Zheng, Junkai Yan, Yi-Xing Peng

    Abstract: Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality c… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: To appear in TPAMI

  35. arXiv:2403.10588  [pdf, other

    cs.SE cs.AI

    S3LLM: Large-Scale Scientific Software Understanding with LLMs using Source, Metadata, and Document

    Authors: Kareem Shaik, Dali Wang, Weijian Zheng, Qinglei Cao, Heng Fan, Peter Schwartz, Yunhe Feng

    Abstract: The understanding of large-scale scientific software poses significant challenges due to its diverse codebase, extensive code length, and target computing architectures. The emergence of generative AI, specifically large language models (LLMs), provides novel pathways for understanding such complex scientific codes. This paper presents S3LLM, an LLM-based framework designed to enable the examinati… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  36. arXiv:2403.10335  [pdf, other

    cs.CV

    NECA: Neural Customizable Human Avatar

    Authors: Junjin Xiao, Qing Zhang, Zhan Xu, Wei-Shi Zheng

    Abstract: Human avatar has become a novel type of 3D asset with various applications. Ideally, a human avatar should be fully customizable to accommodate different settings and environments. In this work, we introduce NECA, an approach capable of learning versatile human representation from monocular or sparse-view videos, enabling granular customization across aspects such as pose, shadow, shape, lighting… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  37. arXiv:2403.08171  [pdf, other

    cs.GT cs.LG

    Tractable Local Equilibria in Non-Concave Games

    Authors: Yang Cai, Constantinos Daskalakis, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

    Abstract: While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when the utilities are non-concave, a situation that is common in machine learning applications where the agents' strategies are parameterized by deep neural networks, or t… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  38. arXiv:2403.06367  [pdf, other

    cs.LG cs.DB

    FeatAug: Automatic Feature Augmentation From One-to-Many Relationship Tables

    Authors: Danrui Qi, Weiling Zheng, Jiannan Wang

    Abstract: Feature augmentation from one-to-many relationship tables is a critical but challenging problem in ML model development. To augment good features, data scientists need to come up with SQL queries manually, which is time-consuming. Featuretools [1] is a widely used tool by the data science community to automatically augment the training data by extracting new features from relevant tables. It repre… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  39. arXiv:2403.06077  [pdf, other

    cs.DC

    Steering a Fleet: Adaptation for Large-Scale, Workflow-Based Experiments

    Authors: Jim Pruyne, Valerie Hayot-Sasson, Weijian Zheng, Ryan Chard, Justin M. Wozniak, Tekin Bicer, Kyle Chard, Ian T. Foster

    Abstract: Experimental science is increasingly driven by instruments that produce vast volumes of data and thus a need to manage, compute, describe, and index this data. High performance and distributed computing provide the means of addressing the computing needs; however, in practice, the variety of actions required and the distributed set of resources involved, requires sophisticated "flows" defining the… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  40. arXiv:2403.05121  [pdf, other

    cs.CV

    CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

    Authors: Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang

    Abstract: Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details. To tackle the issue, we propose CogView3, an innovative cascaded framework that enhances the performance of text-to-image diffusion. CogView3 is the… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  41. arXiv:2403.02767  [pdf, other

    cs.CV

    DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking

    Authors: Cheng Huang, Shoudong Han, Mengyu He, Wenbo Zheng, Yuhao Wei

    Abstract: Accurate data association is crucial in reducing confusion, such as ID switches and assignment errors, in multi-object tracking (MOT). However, existing advanced methods often overlook the diversity among trajectories and the ambiguity and conflicts present in motion and appearance cues, leading to confusion among detections, trajectories, and associations when performing simple global data associ… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024

  42. arXiv:2403.01560  [pdf, other

    cs.CV

    Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

    Authors: Kun-Yu Lin, Henghui Ding, Jiaming Zhou, Yu-Ming Tang, Yi-Xing Peng, Zhilin Zhao, Chen Change Loy, Wei-Shi Zheng

    Abstract: Building upon the impressive success of CLIP (Contrastive Language-Image Pretraining), recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to efficient and effective video learners for open-vocabulary action recognition. Inspired by that humans perform actions in diverse environments, our work delves into an intriguing question: Can CLIP-based video learners effect… ▽ More

    Submitted 24 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  43. arXiv:2403.01494  [pdf, other

    eess.AS cs.SD eess.SP

    PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion

    Authors: Tianhua Qi, Wenming Zheng, Cheng Lu, Yuan Zong, Hailun Lian

    Abstract: In this paper, we propose Prosody-aware VITS (PAVITS) for emotional voice conversion (EVC), aiming to achieve two major objectives of EVC: high content naturalness and high emotional naturalness, which are crucial for meeting the demands of human perception. To improve the content naturalness of converted audio, we have developed an end-to-end EVC architecture inspired by the high audio quality of… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: Accepted to ICASSP2024

  44. arXiv:2403.01251  [pdf, other

    cs.CL

    Accelerating Greedy Coordinate Gradient via Probe Sampling

    Authors: Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi, Anirudh Goyal, Michael Shieh

    Abstract: Safety of Large Language Models (LLMs) has become a critical issue given their rapid progresses. Greedy Coordinate Gradient (GCG) is shown to be effective in constructing adversarial prompts to break the aligned LLMs, but optimization of GCG is time-consuming. To reduce the time cost of GCG and enable more comprehensive studies of LLM safety, in this work, we study a new algorithm called… ▽ More

    Submitted 27 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  45. arXiv:2402.16187  [pdf, other

    cs.CR cs.CL cs.LG

    No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices

    Authors: Qi Pang, Shengyuan Hu, Wenting Zheng, Virginia Smith

    Abstract: Advances in generative models have made it possible for AI-generated text, code, and images to mirror human-generated content in many applications. Watermarking, a technique that aims to embed information in the output of a model to verify its source, is useful for mitigating the misuse of such AI-generated content. However, we show that common design choices in LLM watermarking schemes make the r… ▽ More

    Submitted 25 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  46. arXiv:2402.15235  [pdf, other

    cs.IR

    Multi-Agent Collaboration Framework for Recommender Systems

    Authors: Zhefan Wang, Yuanqing Yu, Wendi Zheng, Weizhi Ma, Min Zhang

    Abstract: LLM-based agents have gained considerable attention for their decision-making skills and ability to handle complex tasks. Recognizing the current gap in leveraging agent capabilities for multi-agent collaboration in recommendation systems, we introduce MACRec, a novel framework designed to enhance recommendation systems through multi-agent collaboration. Unlike existing work on using agents for us… ▽ More

    Submitted 27 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  47. arXiv:2402.11592  [pdf, other

    cs.LG cs.CL

    Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

    Authors: Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen

    Abstract: In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications… ▽ More

    Submitted 27 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  48. arXiv:2402.11502  [pdf, other

    cs.CV

    GenAD: Generative End-to-End Autonomous Driving

    Authors: Wenzhao Zheng, Ruiqi Song, Xianda Guo, Chenming Zhang, Long Chen

    Abstract: Directly producing planning results from raw sensors has been a long-desired solution for autonomous driving and has attracted increasing attention recently. Most existing end-to-end autonomous driving methods factorize this problem into perception, motion prediction, and planning. However, we argue that the conventional progressive pipeline still cannot comprehensively model the entire traffic ev… ▽ More

    Submitted 6 April, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Code is available at: https://github.com/wzzheng/GenAD

  49. arXiv:2402.09444  [pdf, other

    eess.SP cs.AI cs.CV

    Multimodal Action Quality Assessment

    Authors: Ling-An Zeng, Wei-Shi Zheng

    Abstract: Action quality assessment (AQA) is to assess how well an action is performed. Previous works perform modelling by only the use of visual information, ignoring audio information. We argue that although AQA is highly dependent on visual information, the audio is useful complementary information for improving the score regression accuracy, especially for sports with background music, such as figure s… ▽ More

    Submitted 20 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: IEEE Transactions on Image Processing 2024

    ACM Class: I.2.10

  50. arXiv:2402.06512  [pdf, other

    cs.LG cs.CL

    Multimodal Clinical Trial Outcome Prediction with Large Language Models

    Authors: Wenhao Zheng, Dongsheng Peng, Hongxia Xu, Yun Li, Hongtu Zhu, Tianfan Fu, Huaxiu Yao

    Abstract: The clinical trial is a pivotal and costly process, often spanning multiple years and requiring substantial financial resources. Therefore, the development of clinical trial outcome prediction models aims to exclude drugs likely to fail and holds the potential for significant cost savings. Recent data-driven attempts leverage deep learning methods to integrate multimodal data for predicting clinic… ▽ More

    Submitted 8 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.