Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 539 results for author: Cao, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02165  [pdf, other

    cs.CV

    WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation

    Authors: Zihao Huang, ShouKang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu

    Abstract: Existing human datasets for avatar creation are typically limited to laboratory environments, wherein high-quality annotations (e.g., SMPL estimation from 3D scans or multi-view images) can be ideally provided. However, their annotating requirements are impractical for real-world images or videos, posing challenges toward real-world applications on current avatar creation methods. To this end, we… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.01971  [pdf, other

    cs.CV

    Pseudo-Labeling by Multi-Policy Viewfinder Network for Image Cropping

    Authors: Zhiyu Pan, Kewei Wang, Yizheng Wu, Liwen Xiao, Jiahao Cui, Zhicheng Wang, Zhiguo Cao

    Abstract: Automatic image cropping models predict reframing boxes to enhance image aesthetics. Yet, the scarcity of labeled data hinders the progress of this task. To overcome this limitation, we explore the possibility of utilizing both labeled and unlabeled data together to expand the scale of training data for image cropping models. This idea can be implemented in a pseudo-labeling way: producing pseudo… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 18 pages, 8figures

  3. arXiv:2407.01479  [pdf, other

    cs.RO cs.LG

    EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

    Authors: Jingyun Yang, Zi-ang Cao, Congyue Deng, Rika Antonova, Shuran Song, Jeannette Bohg

    Abstract: Building effective imitation learning methods that enable robots to learn from limited data and still generalize across diverse real-world environments is a long-standing problem in robot learning. We propose EquiBot, a robust, data-efficient, and generalizable approach for robot manipulation task learning. Our approach combines SIM(3)-equivariant neural network architectures with diffusion models… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally

  4. arXiv:2406.17988  [pdf, other

    cs.CV

    DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

    Authors: Qingxuan Wu, Zhiyang Dou, Sirui Xu, Soshi Shimada, Chen Wang, Zhengming Yu, Yuan Liu, Cheng Lin, Zeyu Cao, Taku Komura, Vladislav Golyanik, Christian Theobalt, Wenping Wang, Lingjie Liu

    Abstract: Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 23 pages, 9 figures, 3 tables

  5. arXiv:2406.16905  [pdf

    cs.LG cs.AI

    Optimising Random Forest Machine Learning Algorithms for User VR Experience Prediction Based on Iterative Local Search-Sparrow Search Algorithm

    Authors: Xirui Tang, Feiyang Li, Zinan Cao, Qixuan Yu, Yulu Gong

    Abstract: In this paper, an improved method for VR user experience prediction is investigated by introducing a sparrow search algorithm and a random forest algorithm improved by an iterative local search-optimised sparrow search algorithm. The study firstly conducted a statistical analysis of the data, and then trained and tested using the traditional random forest model, the random forest model improved by… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  6. arXiv:2406.16776  [pdf, other

    cs.CV

    Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation

    Authors: Yizheng Wu, Zhiyu Pan, Kewei Wang, Xingyi Li, Jiahao Cui, Liwen Xiao, Guosheng Lin, Zhiguo Cao

    Abstract: Large-scale datasets with point-wise semantic and instance labels are crucial to 3D instance segmentation but also expensive. To leverage unlabeled data, previous semi-supervised 3D instance segmentation approaches have explored self-training frameworks, which rely on high-quality pseudo labels for consistency regularization. They intuitively utilize both instance and semantic pseudo labels in a j… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 14 pages, 10 figures

  7. arXiv:2406.16317  [pdf

    cs.SD eess.AS

    SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement

    Authors: Zhongshu Hou, Qinwen Hu, Zhanzhong Cao, Ming Tang, Jing Lu

    Abstract: Despite significant progress made in the last decade, deep neural network (DNN) based speech enhancement (SE) still faces the challenge of notable degradation in the quality of recovered speech under low signal-to-noise ratio (SNR) conditions. In this letter, we propose an SNR-progressive speech enhancement model with harmonic compensation for low-SNR SE. Reliable pitch estimation is obtained from… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  8. arXiv:2406.14955  [pdf, other

    cs.CL

    ICLEval: Evaluating In-Context Learning Ability of Large Language Models

    Authors: Wentong Chen, Yankai Lin, ZhenHao Zhou, HongYun Huang, Yantao Jia, Zhao Cao, Ji-Rong Wen

    Abstract: In-Context Learning (ICL) is a critical capability of Large Language Models (LLMs) as it empowers them to comprehend and reason across interconnected inputs. Evaluating the ICL ability of LLMs can enhance their utilization and deepen our understanding of how this ability is acquired at the training stage. However, existing evaluation frameworks primarily focus on language abilities and knowledge,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  9. arXiv:2406.13943  [pdf, ps, other

    cs.IT

    New QEC codes and EAQEC codes from repeated-root cyclic codes of length $2^rp^s$

    Authors: Lanqiang Li, Ziwen Cao, Tingting Wu, Li Liu

    Abstract: Let $p$ be an odd prime and $r,s,m$ be positive integers. In this study, we initiate our exploration by delving into the intricate structure of all repeated-root cyclic codes and their duals with a length of $2^rp^s$ over the finite field $\mathbb{F}_{p^m}$. Through the utilization of CSS and Steane's constructions, a series of new quantum error-correcting (QEC) codes are constructed with paramete… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    MSC Class: 94B15 (Primary) 94B05; 11T71(Secondary)

  10. arXiv:2406.13378  [pdf, other

    cs.CV

    Any360D: Towards 360 Depth Anything with Unlabeled 360 Data and Möbius Spatial Augmentation

    Authors: Zidong Cao, Jinjing Zhu, Weiming Zhang, Lin Wang

    Abstract: Recently, Depth Anything Model (DAM) - a type of depth foundation model - reveals impressive zero-shot capacity for diverse perspective images. Despite its success, it remains an open question regarding DAM's performance on 360 images that enjoy a large field-of-view (180x360) but suffer from spherical distortions. To this end, we establish, to our knowledge, the first benchmark that aims to 1) ev… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  11. arXiv:2406.10765  [pdf, other

    cs.DC

    PWDFT-SW: Extending the Limit of Plane-Wave DFT Calculations to 16K Atoms on the New Sunway Supercomputer

    Authors: Qingcai Jiang, Zhenwei Cao, Junshi Chen, Xinming Qin, Wei Hu, Hong An, Jinlong Yang

    Abstract: First-principles density functional theory (DFT) with plane wave (PW) basis set is the most widely used method in quantum mechanical material simulations due to its advantages in accuracy and universality. However, a perceived drawback of PW-based DFT calculations is their substantial computational cost and memory usage, which currently limits their ability to simulate large-scale complex systems… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  12. arXiv:2406.07588  [pdf, other

    cs.MM cs.CL

    AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning

    Authors: Jun Gao, Qian Qiao, Ziqiang Cao, Zili Wang, Wenjie Li

    Abstract: In-context learning (ICL) facilitates Large Language Models (LLMs) exhibiting emergent ability on downstream tasks without updating billions of parameters. However, in the area of multi-modal Large Language Models (MLLMs), two problems hinder the application of multi-modal ICL: (1) Most primary MLLMs are only trained on single-image datasets, making them unable to read multi-modal demonstrations.… ▽ More

    Submitted 30 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  13. arXiv:2406.06073  [pdf, other

    cs.CL

    Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

    Authors: Yan Gao, Zhiwei Cao, Zhongjian Miao, Baosong Yang, Shiyu Liu, Min Zhang, Jinsong Su

    Abstract: To achieve non-parametric NMT domain adaptation, $k$-Nearest-Neighbor Machine Translation ($k$NN-MT) constructs an external datastore to store domain-specific translation knowledge, which derives a $k$NN distribution to interpolate the prediction distribution of the NMT model via a linear interpolation coefficient $λ$. Despite its success, $k$NN retrieval at each timestep leads to substantial time… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  14. arXiv:2406.04999  [pdf, other

    cs.CV

    ProMotion: Prototypes As Motion Learners

    Authors: Yawen Lu, Dongfang Liu, Qifan Wang, Cheng Han, Yiming Cui, Zhiwen Cao, Xueling Zhang, Yingjie Victor Chen, Heng Fan

    Abstract: In this work, we introduce ProMotion, a unified prototypical framework engineered to model fundamental motion tasks. ProMotion offers a range of compelling attributes that set it apart from current task-specific paradigms. We adopt a prototypical perspective, establishing a unified paradigm that harmonizes disparate motion learning approaches. This novel paradigm streamlines the architectural desi… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 11 pages

  15. arXiv:2406.03978  [pdf, other

    cs.MA cs.LG

    Mini Honor of Kings: A Lightweight Environment for Multi-Agent Reinforcement Learning

    Authors: Lin Liu, Jian Zhao, Cheng Hu, Zhengtao Cao, Youpeng Zhao, Zhenbin Ye, Meng Meng, Wenjun Wang, Zhaofeng He, Houqiang Li, Xia Lin, Lanxiao Huang

    Abstract: Games are widely used as research environments for multi-agent reinforcement learning (MARL), but they pose three significant challenges: limited customization, high computational demands, and oversimplification. To address these issues, we introduce the first publicly available map editor for the popular mobile game Honor of Kings and design a lightweight environment, Mini Honor of Kings (Mini Ho… ▽ More

    Submitted 16 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  16. arXiv:2406.02376  [pdf, other

    cs.CL

    Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs

    Authors: Zhiwei Cao, Qian Cao, Yu Lu, Ningxin Peng, Luyang Huang, Shanbo Cheng, Jinsong Su

    Abstract: The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports t… ▽ More

    Submitted 17 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  17. arXiv:2406.01559  [pdf, other

    cs.CV

    Prototypical Transformer as Unified Motion Learners

    Authors: Cheng Han, Yawen Lu, Guohao Sun, James C. Liang, Zhiwen Cao, Qifan Wang, Qiang Guan, Sohail A. Dianat, Raghuveer M. Rao, Tong Geng, Zhiqiang Tao, Dongfang Liu

    Abstract: In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer by thoughtfully considering motion dynamics, introducing two innovative designs. First, Cross-Attention Prototyping discovers prototypes based on signature moti… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 21 pages, 10 figures

  18. arXiv:2406.01070  [pdf, other

    cs.CL

    Guiding ChatGPT to Generate Salient Domain Summaries

    Authors: Jun Gao, Ziqiang Cao, Shaoyao Huang, Luozheng Qin, Chunhui Ai

    Abstract: ChatGPT is instruct-tuned to generate general and human-expected content to align with human preference through Reinforcement Learning from Human Feedback (RLHF), meanwhile resulting in generated responses not salient enough. Therefore, in this case, ChatGPT may fail to satisfy domain requirements in zero-shot settings, leading to poor ROUGE scores. Inspired by the In-Context Learning (ICL) and re… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  19. arXiv:2406.00507  [pdf, other

    cs.CL cs.AI

    Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization

    Authors: Shichao Sun, Ruifeng Yuan, Ziqiang Cao, Wenjie Li, Pengfei Liu

    Abstract: Large language models (LLMs) have demonstrated the capacity to improve summary quality by mirroring a human-like iterative process of critique and refinement starting from the initial draft. Two strategies are designed to perform this iterative process: Prompt Chaining and Stepwise Prompt. Prompt chaining orchestrates the drafting, critiquing, and refining phases through a series of three discrete… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to Findings of ACL 2024

  20. arXiv:2405.19850  [pdf, other

    cs.AI

    Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models

    Authors: Yuxiao Luo, Zhongcai Cao, Xin Jin, Kang Liu, Ling Yin

    Abstract: Understanding human mobility patterns is essential for various applications, from urban planning to public safety. The individual trajectory such as mobile phone location data, while rich in spatio-temporal information, often lacks semantic detail, limiting its utility for in-depth mobility analysis. Existing methods can infer basic routine activity sequences from this data, lacking depth in under… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  21. arXiv:2405.17062  [pdf, other

    cs.CL

    Unifying Demonstration Selection and Compression for In-Context Learning

    Authors: Jun Gao, Ziqiang Cao, Wenjie Li

    Abstract: In-context learning (ICL) facilitates large language models (LLMs) exhibiting spectacular emergent capabilities in various scenarios. Unfortunately, introducing demonstrations easily makes the prompt length explode, bringing a significant burden to hardware. In addition, random demonstrations usually achieve limited improvements in ICL, necessitating demonstration selection among accessible candid… ▽ More

    Submitted 15 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  22. arXiv:2405.17052  [pdf, other

    cs.CL

    SelfCP: Compressing Over-Limit Prompt via the Frozen Large Language Model Itself

    Authors: Jun Gao, Ziqiang Cao, Wenjie Li

    Abstract: Long prompt leads to huge hardware costs when using transformer-based Large Language Models (LLMs). Unfortunately, many tasks, such as summarization, inevitably introduce long documents, and the wide application of in-context learning easily makes the prompt length explode. This paper proposes a Self-Compressor (SelfCP), which employs the target LLM itself to compress over-limit prompts into dense… ▽ More

    Submitted 18 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  23. arXiv:2405.16802  [pdf, other

    cs.CL cs.LG

    AutoCV: Empowering Reasoning with Automated Process Labeling via Confidence Variation

    Authors: Jianqiao Lu, Zhiyang Dou, Hongru Wang, Zeyu Cao, Jianbo Dai, Yingjia Wan, Yinya Huang, Zhijiang Guo

    Abstract: In this work, we propose a novel method named \textbf{Auto}mated Process Labeling via \textbf{C}onfidence \textbf{V}ariation (\textbf{\textsc{AutoCV}}) to enhance the reasoning capabilities of large language models (LLMs) by automatically annotating the reasoning steps. Our approach begins by training a verification model on the correctness of final answers, enabling it to generate automatic proce… ▽ More

    Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: 20 pages, 1 figure, 13 tables

  24. arXiv:2405.12218  [pdf, other

    cs.CV

    Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

    Authors: Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu

    Abstract: We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Project page: https://mvsgaussian.github.io/

  25. arXiv:2405.11564  [pdf, other

    cs.CV

    CRF360D: Monocular 360 Depth Estimation via Spherical Fully-Connected CRFs

    Authors: Zidong Cao, Lin Wang

    Abstract: Monocular 360 depth estimation is challenging due to the inherent distortion of the equirectangular projection (ERP). This distortion causes a problem: spherical adjacent points are separated after being projected to the ERP plane, particularly in the polar regions. To tackle this problem, recent methods calculate the spherical neighbors in the tangent domain. However, as the tangent patch and sph… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  26. arXiv:2405.11198  [pdf, other

    math.OC cs.AI

    Adaptive Stabilization Based on Machine Learning for Column Generation

    Authors: Yunzhuang Shen, Yuan Sun, Xiaodong Li, Zhiguang Cao, Andrew Eberhard, Guangquan Zhang

    Abstract: Column generation (CG) is a well-established method for solving large-scale linear programs. It involves iteratively optimizing a subproblem containing a subset of columns and using its dual solution to generate new columns with negative reduced costs. This process continues until the dual values converge to the optimal dual solution to the original problem. A natural phenomenon in CG is the heavy… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML'24

  27. arXiv:2405.10853  [pdf, other

    cs.LG cs.AI cs.DC

    The Future of Large Language Model Pre-training is Federated

    Authors: Lorenzo Sani, Alex Iacob, Zeyu Cao, Bill Marino, Yan Gao, Tomas Paulik, Wanru Zhao, William F. Shen, Preslav Aleksandrov, Xinchi Qiu, Nicholas D. Lane

    Abstract: Generative pre-trained large language models (LLMs) have demonstrated impressive performance over a wide range of tasks, thanks to the unprecedented amount of data they have been trained on. As established scaling laws indicate, LLMs' future performance improvement depends on the amount of computing and data sources we can leverage for pre-training. Federated learning (FL) has the potential to unl… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, pre-print

  28. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  29. arXiv:2405.08055  [pdf, other

    cs.CV

    DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation

    Authors: Ziang Cao, Fangzhou Hong, Tong Wu, Liang Pan, Ziwei Liu

    Abstract: Generating diverse and high-quality 3D assets automatically poses a fundamental yet challenging task in 3D computer vision. Despite extensive efforts in 3D generation, existing optimization-based approaches struggle to produce large-scale 3D assets efficiently. Meanwhile, feed-forward methods often focus on generating only a single category or a few categories, limiting their generalizability. The… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2309.07920

  30. arXiv:2405.06808  [pdf, other

    q-fin.RM cs.AI cs.CL

    Large Language Model in Financial Regulatory Interpretation

    Authors: Zhiyu Cao, Zachary Feinstein

    Abstract: This study explores the innovative use of Large Language Models (LLMs) as analytical tools for interpreting complex financial regulations. The primary objective is to design effective prompts that guide LLMs in distilling verbose and intricate regulatory texts, such as the Basel III capital requirement regulations, into a concise mathematical framework that can be subsequently translated into acti… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  31. arXiv:2405.05984  [pdf, other

    cs.LG cs.AI

    Few-Shot Class Incremental Learning via Robust Transformer Approach

    Authors: Naeem Paeedeh, Mahardhika Pratama, Sunu Wibirama, Wolfgang Mayer, Zehong Cao, Ryszard Kowalczyk

    Abstract: Few-Shot Class-Incremental Learning presents an extension of the Class Incremental Learning problem where a model is faced with the problem of data scarcity while addressing the catastrophic forgetting problem. This problem remains an open problem because all recent works are built upon the convolutional neural networks performing sub-optimally compared to the transformer approaches. Our paper pre… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Under Review in Information Sciences

  32. arXiv:2405.02676  [pdf, other

    cs.CV cs.GR

    Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

    Authors: Haoyu Hu, Xinyu Yi, Zhe Cao, Jun-Hai Yong, Feng Xu

    Abstract: Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and t… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH 2024 Conference Track

    ACM Class: I.5.4

  33. arXiv:2405.01029  [pdf, other

    cs.AI cs.LG

    MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts

    Authors: Jianan Zhou, Zhiguang Cao, Yaoxin Wu, Wen Song, Yining Ma, Jie Zhang, Chi Xu

    Abstract: Learning to solve vehicle routing problems (VRPs) has garnered much attention. However, most neural solvers are only structured and trained independently on a specific problem, making them less generic and practical. In this paper, we aim to develop a unified neural solver that can cope with a range of VRP variants simultaneously. Specifically, we propose a multi-task vehicle routing solver with m… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML 2024

  34. arXiv:2405.00351  [pdf, other

    cs.HC cs.AI cs.CV cs.MM

    Learning High-Quality Navigation and Zooming on Omnidirectional Images in Virtual Reality

    Authors: Zidong Cao, Zhan Wang, Yexin Liu, Yan-Pei Cao, Ying Shan, Wei Zeng, Lin Wang

    Abstract: Viewing omnidirectional images (ODIs) in virtual reality (VR) represents a novel form of media that provides immersive experiences for users to navigate and interact with digital content. Nonetheless, this sense of immersion can be greatly compromised by a blur effect that masks details and hampers the user's ability to engage with objects of interest. In this paper, we present a novel system, cal… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 11 pages

  35. arXiv:2404.19639  [pdf, other

    cs.CV

    ESP-Zero: Unsupervised enhancement of zero-shot classification for Extremely Sparse Point cloud

    Authors: Jiayi Han, Zidi Cao, Weibo Zheng, Xiangguo Zhou, Xiangjian He, Yuanfang Zhang, Daisen Wei

    Abstract: In recent years, zero-shot learning has attracted the focus of many researchers, due to its flexibility and generality. Many approaches have been proposed to achieve the zero-shot classification of the point clouds for 3D object understanding, following the schema of CLIP. However, in the real world, the point clouds could be extremely sparse, dramatically limiting the effectiveness of the 3D poin… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  36. arXiv:2404.17528  [pdf, other

    cs.CV

    Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields

    Authors: Tianqi Liu, Xinyi Ye, Min Shi, Zihao Huang, Zhiyu Pan, Zhan Peng, Zhiguo Cao

    Abstract: Generalizable NeRF aims to synthesize novel views for unseen scenes. Common practices involve constructing variance-based cost volumes for geometry reconstruction and encoding 3D descriptors for decoding novel views. However, existing methods show limited generalization ability in challenging conditions due to inaccurate geometry, sub-optimal descriptors, and decoding strategies. We address these… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. Project page: https://gefucvpr24.github.io

  37. arXiv:2404.15174  [pdf, other

    cs.CV

    Fourier-enhanced Implicit Neural Fusion Network for Multispectral and Hyperspectral Image Fusion

    Authors: Yu-Jie Liang, Zihan Cao, Liang-Jian Deng, Xiao Wu

    Abstract: Recently, implicit neural representations (INR) have made significant strides in various vision-related domains, providing a novel solution for Multispectral and Hyperspectral Image Fusion (MHIF) tasks. However, INR is prone to losing high-frequency information and is confined to the lack of global perceptual capabilities. To address these issues, this paper introduces a Fourier-enhanced Implicit… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  38. arXiv:2404.12887  [pdf, other

    cs.CV eess.IV

    3D Multi-frame Fusion for Video Stabilization

    Authors: Zhan Peng, Xinyi Ye, Weiyue Zhao, Tianqi Liu, Huiqiang Sun, Baopu Li, Zhiguo Cao

    Abstract: In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rend… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  39. arXiv:2404.12804  [pdf, other

    cs.CV eess.IV

    Linearly-evolved Transformer for Pan-sharpening

    Authors: Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Man Zhou, Danfeng Hong

    Abstract: Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 10 pages

  40. arXiv:2404.11677  [pdf, other

    cs.AI

    Cross-Problem Learning for Solving Vehicle Routing Problems

    Authors: Zhuoyi Lin, Yaoxin Wu, Bangjian Zhou, Zhiguang Cao, Wen Song, Yingqian Zhang, Senthilnath Jayavelu

    Abstract: Existing neural heuristics often train a deep architecture from scratch for each specific vehicle routing problem (VRP), ignoring the transferable knowledge across different VRP variants. This paper proposes the cross-problem learning to assist heuristics training for different downstream VRP variants. Particularly, we modularize neural architectures for complex VRPs into 1) the backbone Transform… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI'24

  41. arXiv:2404.11537  [pdf, other

    cs.CV eess.IV

    SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening

    Authors: Yu Zhong, Xiao Wu, Liang-Jian Deng, Zihan Cao

    Abstract: Pansharpening is a significant image fusion technique that merges the spatial content and spectral characteristics of remote sensing images to generate high-resolution multispectral images. Recently, denoising diffusion probabilistic models have been gradually applied to visual tasks, enhancing controllable image generation through low-rank adaptation (LoRA). In this paper, we introduce a spatial-… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  42. arXiv:2404.11416  [pdf, other

    cs.CV

    Neural Shrödinger Bridge Matching for Pansharpening

    Authors: Zihan Cao, Xiao Wu, Liang-Jian Deng

    Abstract: Recent diffusion probabilistic models (DPM) in the field of pansharpening have been gradually gaining attention and have achieved state-of-the-art (SOTA) performance. In this paper, we identify shortcomings in directly applying DPMs to the task of pansharpening as an inverse problem: 1) initiating sampling directly from Gaussian noise neglects the low-resolution multispectral image (LRMS) as a pri… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  43. arXiv:2404.09293  [pdf, other

    cs.CV

    A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion

    Authors: Zihan Cao, Xiao Wu, Liang-Jian Deng, Yu Zhong

    Abstract: In image fusion tasks, images from different sources possess distinct characteristics. This has driven the development of numerous methods to explore better ways of fusing them while preserving their respective characteristics. Mamba, as a state space model, has emerged in the field of natural language processing. Recently, many studies have attempted to extend Mamba to vision tasks. However, due… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  44. arXiv:2404.09001  [pdf, other

    cs.RO cs.AI cs.CV

    Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households

    Authors: Zhihao Cao, Zidong Wang, Siwen Xie, Anji Liu, Lifeng Fan

    Abstract: Despite the significant demand for assistive technology among vulnerable groups (e.g., the elderly, children, and the disabled) in daily tasks, research into advanced AI-driven assistive solutions that genuinely accommodate their diverse needs remains sparse. Traditional human-machine interaction tasks often require machines to simply help without nuanced consideration of human abilities and feeli… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  45. arXiv:2404.00681  [pdf, other

    cs.CL

    CoUDA: Coherence Evaluation via Unified Data Augmentation

    Authors: Dawei Zhu, Wenhao Wu, Yifan Song, Fangwei Zhu, Ziqiang Cao, Sujian Li

    Abstract: Coherence evaluation aims to assess the organization and structure of a discourse, which remains challenging even in the era of large language models. Due to the scarcity of annotated data, data augmentation is commonly used for training coherence evaluation models. However, previous augmentations for this task primarily rely on heuristic rules, lacking designing criteria as guidance. In this pape… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: NAACL 2024

  46. arXiv:2403.17708  [pdf, other

    cs.CV cs.HC cs.MM

    Panonut360: A Head and Eye Tracking Dataset for Panoramic Video

    Authors: Yutong Xu, Junhao Du, Jiahe Wang, Yuwei Ning, Sihan Zhou Yang Cao

    Abstract: With the rapid development and widespread application of VR/AR technology, maximizing the quality of immersive panoramic video services that match users' personal preferences and habits has become a long-standing challenge. Understanding the saliency region where users focus, based on data collected with HMDs, can promote multimedia encoding, transmission, and quality assessment. At the same time,… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 7 pages,ACM MMSys'24 accepted

  47. arXiv:2403.15789  [pdf, other

    cs.CV

    In-Context Matting

    Authors: He Guo, Zixuan Ye, Zhiguo Cao, Hao Lu

    Abstract: We introduce in-context matting, a novel task setting of image matting. Given a reference image of a certain foreground and guided priors such as points, scribbles, and masks, in-context matting enables automatic alpha estimation on a batch of target images of the same foreground category, without additional auxiliary input. This setting marries good performance in auxiliary input-based matting an… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Code is available at https://github.com/tiny-smart/in-context-matting

  48. arXiv:2403.15734  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Space Group Informed Transformer for Crystalline Materials Generation

    Authors: Zhendong Cao, Xiaoshan Luo, Jian Lv, Lei Wang

    Abstract: We introduce CrystalFormer, a transformer-based autoregressive model specifically designed for space group-controlled generation of crystalline materials. The space group symmetry significantly simplifies the crystal space, which is crucial for data and compute efficient generative modeling of crystalline materials. Leveraging the prominent discrete and sequential nature of the Wyckoff positions,… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 17 pages, 8 figures

  49. arXiv:2403.14978  [pdf, other

    cs.IT eess.SP

    Range-Angle Estimation for FDA-MIMO System With Frequency Offset

    Authors: Mengjiang Sun, Peng Chen, Zhenxin Cao

    Abstract: Frequency diverse array multiple-input multiple-output (FDA-MIMO) radar differs from the traditional phased array (PA) radar, and can form range-angle-dependent beampattern and differentiate between closely spaced targets sharing the same angle but occupying distinct range cells. In the FDA-MIMO radar, target range estimation is achieved by employing a subtle frequency variation between adjacent a… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Journal ref: IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024

  50. An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models

    Authors: Qi Liu, Gang Guo, Jiaxin Mao, Zhicheng Dou, Ji-Rong Wen, Hao Jiang, Xinyu Zhang, Zhao Cao

    Abstract: With the development of pre-trained language models, the dense retrieval models have become promising alternatives to the traditional retrieval models that rely on exact match and sparse bag-of-words representations. Different from most dense retrieval models using a bi-encoder to encode each query or document into a dense vector, the recently proposed late-interaction multi-vector models (i.e., C… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by ACM Transactions on Information Systems