Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 154 results for author: Dai, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19236  [pdf, other

    cs.AI cs.CV cs.RO

    Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

    Authors: Minghan Li, Heng Li, Zhi-Qi Cheng, Yifei Dong, Yuxuan Zhou, Jun-Yan He, Qi Dai, Teruko Mitamura, Alexander G. Hauptmann

    Abstract: Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions. However, current VLN frameworks often rely on static environments and optimal expert supervision, limiting their real-world applicability. To address this, we introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activitie… ▽ More

    Submitted 4 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 30 pages, 18 figures, Project Page: https://lpercc.github.io/HA3D_simulator/

  2. arXiv:2406.16268  [pdf, other

    cs.DB

    Efficient Antagonistic k-plex Enumeration in Signed Graphs

    Authors: Lantian Xu, Rong-Hua Li, Dong Wen, Qiangqiang Dai, Guoren Wang, Lu Qin

    Abstract: A signed graph is a graph where each edge receives a sign, positive or negative. The signed graph model has been used in many real applications, such as protein complex discovery and social network analysis. Finding cohesive subgraphs in signed graphs is a fundamental problem. A k-plex is a common model for cohesive subgraphs in which every vertex is adjacent to all but at most k vertices within t… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  3. arXiv:2406.13300  [pdf

    cs.LG

    LightGBM robust optimization algorithm based on topological data analysis

    Authors: Han Yang, Guangjun Qin, Ziyuan Liu, Yongqing Hu, Qinglong Dai

    Abstract: To enhance the robustness of the Light Gradient Boosting Machine (LightGBM) algorithm for image classification, a topological data analysis (TDA)-based robustness optimization algorithm for LightGBM, TDA-LightGBM, is proposed to address the interference of noise on image classification. Initially, the method partitions the feature engineering process into two streams: pixel feature stream and topo… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2406.09397  [pdf, other

    cs.CV cs.AI

    Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

    Authors: Miaosen Zhang, Yixuan Wei, Zhen Xing, Yifei Ma, Zuxuan Wu, Ji Li, Zheng Zhang, Qi Dai, Chong Luo, Xin Geng, Baining Guo

    Abstract: Modern vision models are trained on very large noisy datasets. While these models acquire strong capabilities, they may not follow the user's intent to output the desired results in certain aspects, e.g., visual aesthetic, preferred style, and responsibility. In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 28 pages, 26 figures, under review

  5. arXiv:2406.06465  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

    Authors: Zhen Xing, Qi Dai, Zejia Weng, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Text-guided video prediction (TVP) involves predicting the motion of future frames from the initial frame according to an instruction, which has wide applications in virtual reality, robotics, and content creation. Previous TVP methods make significant breakthroughs by adapting Stable Diffusion for this task. However, they struggle with frame consistency and temporal stability primarily due to the… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  6. arXiv:2405.20325  [pdf, other

    cs.CV

    MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

    Authors: Shuyuan Tu, Qi Dai, Zihao Zhang, Sicheng Xie, Zhi-Qi Cheng, Chong Luo, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Despite impressive advancements in diffusion-based video editing models in altering video attributes, there has been limited exploration into modifying motion information while preserving the original protagonist's appearance and background. In this paper, we propose MotionFollower, a lightweight score-guided diffusion model for video motion editing. To introduce conditional controls to the denois… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 23 pages, 18 figures. Project page at https://francis-rings.github.io/MotionFollower/

    MSC Class: 68T45; 68T10

  7. arXiv:2405.16850  [pdf, other

    eess.IV cs.CV cs.LG

    UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation

    Authors: Runzhao Yang, Yinda Chen, Zhihong Zhang, Xiaoyu Liu, Zongren Li, Kunlun He, Zhiwei Xiong, Jinli Suo, Qionghai Dai

    Abstract: In the field of medical image compression, Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios, yet they are constrained by a one-to-one fitting approach that results in lengthy encoding times. Our novel method, ``\textbf{UniCompress}'', innovatively extends the compression capabilities of INR by being the first to compress multi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  8. Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture

    Authors: Yitong Jin, Zhiping Qiu, Yi Shi, Shuangpeng Sun, Chongwu Wang, Donghao Pan, Jiachen Zhao, Zhenghao Liang, Yuan Wang, Xiaobing Li, Feng Yu, Tao Yu, Qionghai Dai

    Abstract: In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different v… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH2024

  9. arXiv:2404.14759  [pdf, other

    cs.CV

    Unified Unsupervised Salient Object Detection via Knowledge Transfer

    Authors: Yao Yuan, Wutao Liu, Pan Gao, Qun Dai, Jie Qin

    Abstract: Recently, unsupervised salient object detection (USOD) has gained increasing attention due to its annotation-free nature. However, current methods mainly focus on specific tasks such as RGB and RGB-D, neglecting the potential for task migration. In this paper, we propose a unified USOD framework for generic USOD tasks. Firstly, we propose a Progressive Curriculum Learning-based Saliency Distilling… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  10. arXiv:2404.13501  [pdf, other

    cs.AI

    A Survey on the Memory Mechanism of Large Language Model based Agents

    Authors: Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, Ji-Rong Wen

    Abstract: Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Compared with original LLMs, LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems that need long-term and complex agent-environment interactions. The key component to support agent-environment interactions is the m… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 39 pages, 5 figures, 4 tables

  11. arXiv:2404.11998  [pdf, other

    cs.CV

    Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation

    Authors: Qiyuan Dai, Sibei Yang

    Abstract: Referring image segmentation (RIS) aims to precisely segment referents in images through corresponding natural language expressions, yet relying on cost-intensive mask annotations. Weakly supervised RIS thus learns from image-text pairs to pixel-level semantics, which is challenging for segmenting fine-grained masks. A natural approach to enhancing segmentation precision is to empower weakly super… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  12. arXiv:2404.07551  [pdf, other

    eess.IV cs.CV

    Event-Enhanced Snapshot Compressive Videography at 10K FPS

    Authors: Bo Zhang, Jinli Suo, Qionghai Dai

    Abstract: Video snapshot compressive imaging (SCI) encodes the target dynamic scene compactly into a snapshot and reconstructs its high-speed frame sequence afterward, greatly reducing the required data footprint and transmission bandwidth as well as enabling high-speed imaging with a low frame rate intensity camera. In implementation, high-speed dynamics are encoded via temporally varying patterns, and onl… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  13. arXiv:2404.00621  [pdf, other

    cs.IR cs.MM

    Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey

    Authors: Qijiong Liu, Jieming Zhu, Yanting Yang, Quanyu Dai, Zhaocheng Du, Xiao-Ming Wu, Zhou Zhao, Rui Zhang, Zhenhua Dong

    Abstract: Personalized recommendation serves as a ubiquitous channel for users to discover information tailored to their interests. However, traditional recommendation models primarily rely on unique IDs and categorical features for user-item matching, potentially overlooking the nuanced essence of raw item contents across multiple modalities such as text, image, audio, and video. This underutilization of m… ▽ More

    Submitted 3 July, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted by KDD 2024. See our tutorial materials at https://mmrec.github.io

  14. arXiv:2403.15853  [pdf

    eess.IV cs.CV

    An edge detection-based deep learning approach for tear meniscus height measurement

    Authors: Kesheng Wang, Kunhui Xu, Xiaoyu Chen, Chunlei He, Jianfeng Zhang, Dexing Kong, Qi Dai, Shoujun Huang

    Abstract: Automatic measurements of tear meniscus height (TMH) have been achieved by using deep learning techniques; however, annotation is significantly influenced by subjective factors and is both time-consuming and labor-intensive. In this paper, we introduce an automatic TMH measurement technique based on edge detection-assisted annotation within a deep learning framework. This method generates mask lab… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 22 pages, 5 figures

  15. arXiv:2403.11803  [pdf, other

    cs.CV

    Federated Modality-specific Encoders and Multimodal Anchors for Personalized Brain Tumor Segmentation

    Authors: Qian Dai, Dong Wei, Hong Liu, Jinghan Sun, Liansheng Wang, Yefeng Zheng

    Abstract: Most existing federated learning (FL) methods for medical image analysis only considered intramodal heterogeneity, limiting their applicability to multimodal imaging applications. In practice, it is not uncommon that some FL participants only possess a subset of the complete imaging modalities, posing inter-modal heterogeneity as a challenge to effectively training a global model on all participan… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI 2024

  16. arXiv:2403.04736  [pdf, other

    cs.IR

    Benchmarking News Recommendation in the Era of Green AI

    Authors: Qijiong Liu, Jieming Zhu, Quanyu Dai, Xiao-Ming Wu

    Abstract: Over recent years, news recommender systems have gained significant attention in both academia and industry, emphasizing the need for a standardized benchmark to evaluate and compare the performance of these systems. Concurrently, Green AI advocates for reducing the energy consumption and environmental impact of machine learning. To address these concerns, we introduce the first Green AI benchmark… ▽ More

    Submitted 14 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: TheWebConf'24 accepted paper. A revised and condensed version of the previous work titled Only Encode Once: Making Content-based News Recommender Greener. While the core ideas and results remain consistent, the presentation scope have been modified for brevity and clarity. For the full details and extended discussions, please refer to the original long paper at arXiv:2308.14155

  17. arXiv:2402.18092  [pdf, other

    cs.CV

    Context-aware Talking Face Video Generation

    Authors: Meidai Xuanyuan, Yuwang Wang, Honglei Guo, Qionghai Dai

    Abstract: In this paper, we consider a novel and practical case for talking face video generation. Specifically, we focus on the scenarios involving multi-people interactions, where the talking context, such as audience or surroundings, is present. In these situations, the video generation should take the context into consideration in order to generate video content naturally aligned with driving audios and… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  18. arXiv:2402.03307  [pdf, other

    cs.CV

    4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes

    Authors: Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wenzheng Chen, Baoquan Chen

    Abstract: We consider the problem of novel-view synthesis (NVS) for dynamic scenes. Recent neural approaches have accomplished exceptional NVS results for static 3D scenes, but extensions to 4D time-varying scenes remain non-trivial. Prior efforts often encode dynamics by learning a canonical space plus implicit or explicit deformation fields, which struggle in challenging scenarios like sudden movements or… ▽ More

    Submitted 2 July, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Proc. SIGGRAPH, 2024

  19. arXiv:2401.05641  [pdf, other

    cs.OS cs.CR cs.LG

    When eBPF Meets Machine Learning: On-the-fly OS Kernel Compartmentalization

    Authors: Zicheng Wang, Tiejin Chen, Qinrun Dai, Yueqi Chen, Hua Wei, Qingkai Zeng

    Abstract: Compartmentalization effectively prevents initial corruption from turning into a successful attack. This paper presents O2C, a pioneering system designed to enforce OS kernel compartmentalization on the fly. It not only provides immediate remediation for sudden threats but also maintains consistent system availability through the enforcement process. O2C is empowered by the newest advancements o… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  20. arXiv:2401.03153  [pdf, other

    cs.CV

    An Event-Oriented Diffusion-Refinement Method for Sparse Events Completion

    Authors: Bo Zhang, Yuqi Han, Jinli Suo, Qionghai Dai

    Abstract: Event cameras or dynamic vision sensors (DVS) record asynchronous response to brightness changes instead of conventional intensity frames, and feature ultra-high sensitivity at low bandwidth. The new mechanism demonstrates great advantages in challenging scenarios with fast motion and large dynamic range. However, the recorded events might be highly sparse due to either limited hardware bandwidth… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  21. arXiv:2311.18837  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models

    Authors: Zhen Xing, Qi Dai, Zihao Zhang, Hui Zhang, Han Hu, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Diffusion models have achieved significant success in image and video generation. This motivates a growing interest in video editing tasks, where videos are edited according to provided text descriptions. However, most existing approaches only focus on video editing for short clips and rely on time-consuming tuning or inference. We are the first to propose Video Instruction Diffusion (VIDiff), a u… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  22. arXiv:2311.18834  [pdf, other

    cs.CV

    ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

    Authors: Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

    Abstract: We present ART$\boldsymbol{\cdot}$V, an efficient framework for auto-regressive video generation with diffusion models. Unlike existing methods that generate entire videos in one-shot, ART$\boldsymbol{\cdot}$V generates a single frame at a time, conditioned on the previous ones. The framework offers three distinct advantages. First, it only learns simple continual motions between adjacent frames,… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 24 pages, 21 figures. Project page at https://warranweng.github.io/art.v

  23. arXiv:2311.18830  [pdf, other

    cs.CV

    MotionEditor: Editing Video Motion via Content-Aware Diffusion

    Authors: Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Existing diffusion-based video editing models have made gorgeous advances for editing attributes of a source video over time but struggle to manipulate the motion information while preserving the original protagonist's appearance and background. To address this, we propose MotionEditor, a diffusion model for video motion editing. MotionEditor incorporates a novel content-aware motion adapter into… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 18 pages, 15 figures. Project page at https://francis-rings.github.io/MotionEditor/

  24. arXiv:2311.13134  [pdf, other

    cs.CV eess.IV

    Lightweight High-Speed Photography Built on Coded Exposure and Implicit Neural Representation of Videos

    Authors: Zhihong Zhang, Runzhao Yang, Jinli Suo, Yuxiao Cheng, Qionghai Dai

    Abstract: The compact cameras recording high-speed scenes with high resolution are highly demanded, but the required high bandwidth often leads to bulky, heavy systems, which limits their applications on low-capacity platforms. Adopting a coded exposure setup to encode a frame sequence into a blurry snapshot and retrieve the latent sharp video afterward can serve as a lightweight solution. However, restorin… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 19 pages, 10 figures

  25. arXiv:2311.10290  [pdf, other

    cs.DS

    Scalable Algorithms for Laplacian Pseudo-inverse Computation

    Authors: Meihao Liao, Rong-Hua Li, Qiangqiang Dai, Hongyang Chen, Guoren Wang

    Abstract: The pseudo-inverse of a graph Laplacian matrix, denoted as $L^\dagger$, finds extensive application in various graph analysis tasks. Notable examples include the calculation of electrical closeness centrality, determination of Kemeny's constant, and evaluation of resistance distance. However, existing algorithms for computing $L^\dagger$ are often computationally expensive when dealing with large… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  26. arXiv:2311.04467  [pdf, other

    cs.CL cs.AI

    RDGCN: Reinforced Dependency Graph Convolutional Network for Aspect-based Sentiment Analysis

    Authors: Xusheng Zhao, Hao Peng, Qiong Dai, Xu Bai, Huailiang Peng, Yanbing Liu, Qinglang Guo, Philip S. Yu

    Abstract: Aspect-based sentiment analysis (ABSA) is dedicated to forecasting the sentiment polarity of aspect terms within sentences. Employing graph neural networks to capture structural patterns from syntactic dependency parsing has been confirmed as an effective approach for boosting ABSA. In most works, the topology of dependency trees or dependency-based attention coefficients is often loosely regarded… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: The 17th ACM International Conference on Web Search and Data Mining

  27. arXiv:2310.18286  [pdf, other

    cs.LG stat.AP stat.ML

    Optimal Transport for Treatment Effect Estimation

    Authors: Hao Wang, Zhichao Chen, Jiajun Fan, Haoxuan Li, Tianqiao Liu, Weiming Liu, Quanyu Dai, Yichao Wang, Zhenhua Dong, Ruiming Tang

    Abstract: Estimating conditional average treatment effect from observational data is highly challenging due to the existence of treatment selection bias. Prevalent methods mitigate this issue by aligning distributions of different treatment groups in the latent space. However, there are two critical problems that these methods fail to address: (1) mini-batch sampling effects (MSE), which causes misalignment… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Accepted as NeurIPS 2023 Poster

  28. arXiv:2310.11082  [pdf, other

    cs.LG cs.AI q-bio.QM

    Multi-omics Sampling-based Graph Transformer for Synthetic Lethality Prediction

    Authors: Xusheng Zhao, Hao Liu, Qiong Dai, Hao Peng, Xu Bai, Huailiang Peng

    Abstract: Synthetic lethality (SL) prediction is used to identify if the co-mutation of two genes results in cell death. The prevalent strategy is to abstract SL prediction as an edge classification task on gene nodes within SL data and achieve it through graph neural networks (GNNs). However, GNNs suffer from limitations in their message passing mechanisms, including over-smoothing and over-squashing issue… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  29. arXiv:2310.10647  [pdf, other

    cs.CV cs.AI cs.LG

    A Survey on Video Diffusion Models

    Authors: Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang

    Abstract: The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision, with the diffusion model playing a crucial role in this achievement. Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers, demonstrating exceptional performance not only in image generation and editing, bu… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  30. arXiv:2310.05717  [pdf, other

    cs.RO cs.AI cs.CV

    STOPNet: Multiview-based 6-DoF Suction Detection for Transparent Objects on Production Lines

    Authors: Yuxuan Kuang, Qin Han, Danshi Li, Qiyu Dai, Lian Ding, Dong Sun, Hanlin Zhao, He Wang

    Abstract: In this work, we present STOPNet, a framework for 6-DoF object suction detection on production lines, with a focus on but not limited to transparent objects, which is an important and challenging problem in robotic systems and modern industry. Current methods requiring depth input fail on transparent objects due to depth cameras' deficiency in sensing their geometry, while we proposed a novel fram… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Under Review. ICRA 2024 submission

  31. arXiv:2310.01861  [pdf, other

    eess.IV cs.CV cs.GR

    Shifting More Attention to Breast Lesion Segmentation in Ultrasound Videos

    Authors: Junhao Lin, Qian Dai, Lei Zhu, Huazhu Fu, Qiong Wang, Weibin Li, Wenhao Rao, Xiaoyang Huang, Liansheng Wang

    Abstract: Breast lesion segmentation in ultrasound (US) videos is essential for diagnosing and treating axillary lymph node metastasis. However, the lack of a well-established and large-scale ultrasound video dataset with high-quality annotations has posed a persistent challenge for the research community. To overcome this issue, we meticulously curated a US video breast lesion segmentation dataset comprisi… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 10 pages

  32. arXiv:2309.17190  [pdf, other

    cs.CV cs.AI

    PARF: Primitive-Aware Radiance Fusion for Indoor Scene Novel View Synthesis

    Authors: Haiyang Ying, Baowei Jiang, Jinzhi Zhang, Di Xu, Tao Yu, Qionghai Dai, Lu Fang

    Abstract: This paper proposes a method for fast scene radiance field reconstruction with strong novel view synthesis performance and convenient scene editing functionality. The key idea is to fully utilize semantic parsing and primitive extraction for constraining and accelerating the radiance field reconstruction process. To fulfill this goal, a primitive-aware hybrid rendering strategy was proposed to enj… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023; Project page: https://oceanying.github.io/PARF/

  33. Semi-supervised Domain Adaptation on Graphs with Contrastive Learning and Minimax Entropy

    Authors: Jiaren Xiao, Quanyu Dai, Xiao Shen, Xiaochen Xie, Jing Dai, James Lam, Ka-Wai Kwok

    Abstract: Label scarcity in a graph is frequently encountered in real-world applications due to the high cost of data labeling. To this end, semi-supervised domain adaptation (SSDA) on graphs aims to leverage the knowledge of a labeled source graph to aid in node classification on a target graph with limited labels. SSDA tasks need to overcome the domain gap between the source and target graphs. However, to… ▽ More

    Submitted 4 April, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Journal ref: Neurocomputing (2024)

  34. arXiv:2309.01374  [pdf, other

    cs.CV

    ImmersiveNeRF: Hybrid Radiance Fields for Unbounded Immersive Light Field Reconstruction

    Authors: Xiaohang Yu, Haoxiang Wang, Yuqi Han, Lei Yang, Tao Yu, Qionghai Dai

    Abstract: This paper proposes a hybrid radiance field representation for unbounded immersive light field reconstruction which supports high-quality rendering and aggressive view extrapolation. The key idea is to first formally separate the foreground and the background and then adaptively balance learning of them during the training process. To fulfill this goal, we represent the foreground and background a… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

  35. arXiv:2308.14155  [pdf, other

    cs.IR

    Only Encode Once: Making Content-based News Recommender Greener

    Authors: Qijiong Liu, Jieming Zhu, Quanyu Dai, Xiao-Ming Wu

    Abstract: Large pretrained language models (PLM) have become de facto news encoders in modern news recommender systems, due to their strong ability in comprehending textual content. These huge Transformer-based architectures, when finetuned on recommendation tasks, can greatly improve news recommendation performance. However, the PLM-based pretrain-finetune framework incurs high computational cost and energ… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

  36. arXiv:2308.09710  [pdf, other

    cs.CV cs.AI

    SimDA: Simple Diffusion Adapter for Efficient Video Generation

    Authors: Zhen Xing, Qi Dai, Han Hu, Zuxuan Wu, Yu-Gang Jiang

    Abstract: The recent wave of AI-generated content has witnessed the great development and success of Text-to-Image (T2I) technologies. By contrast, Text-to-Video (T2V) still falls short of expectations though attracting increasing interests. Existing works either train from scratch or adapt large T2I model to videos, both of which are computation and resource expensive. In this work, we propose a Simple Dif… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  37. arXiv:2308.05361  [pdf, other

    cs.CL

    WeaverBird: Empowering Financial Decision-Making with Large Language Model, Knowledge Base, and Search Engine

    Authors: Siqiao Xue, Fan Zhou, Yi Xu, Ming Jin, Qingsong Wen, Hongyan Hao, Qingyang Dai, Caigao Jiang, Hongyu Zhao, Shuo Xie, Jianshan He, James Zhang, Hongyuan Mei

    Abstract: We present WeaverBird, an intelligent dialogue system designed specifically for the finance domain. Our system harnesses a large language model of GPT architecture that has been tuned using extensive corpora of finance-related text. As a result, our system possesses the capability to understand complex financial queries, such as "How should I manage my investments during inflation?", and provide i… ▽ More

    Submitted 6 April, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: revise abstract

  38. arXiv:2305.10033  [pdf, other

    cs.LG math.NA

    SHoP: A Deep Learning Framework for Solving High-order Partial Differential Equations

    Authors: Tingxiong Xiao, Runzhao Yang, Yuxiao Cheng, Jinli Suo, Qionghai Dai

    Abstract: Solving partial differential equations (PDEs) has been a fundamental problem in computational science and of wide applications for both scientific and engineering research. Due to its universal approximation property, neural network is widely used to approximate the solutions of PDEs. However, existing works are incapable of solving high-order PDEs due to insufficient calculation accuracy of highe… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: We propose the Taylor expansion of neural networks, and applied it to solving high-order PDEs, named SHoP

  39. arXiv:2304.13518  [pdf, other

    cs.CV

    Super-NeRF: View-consistent Detail Generation for NeRF super-resolution

    Authors: Yuqi Han, Tao Yu, Xiaohang Yu, Yuwang Wang, Qionghai Dai

    Abstract: The neural radiance field (NeRF) achieved remarkable success in modeling 3D scenes and synthesizing high-fidelity novel views. However, existing NeRF-based methods focus more on the make full use of the image resolution to generate novel views, but less considering the generation of details under the limited input resolution. In analogy to the extensive usage of image super-resolution, NeRF super-… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

  40. arXiv:2304.11093  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Hi Sheldon! Creating Deep Personalized Characters from TV Shows

    Authors: Meidai Xuanyuan, Yuwang Wang, Honglei Guo, Xiao Ma, Yuchen Guo, Tao Yu, Qionghai Dai

    Abstract: Imagine an interesting multimodal interactive scenario that you can see, hear, and chat with an AI-generated digital character, who is capable of behaving like Sheldon from The Big Bang Theory, as a DEEP copy from appearance to personality. Towards this fantastic multimodal chatting scenario, we propose a novel task, named Deep Personalized Character Creation (DPCC): creating multimodal chat perso… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

  41. arXiv:2304.10465  [pdf, other

    cs.CV cs.AI

    Implicit Temporal Modeling with Learnable Alignment for Video Recognition

    Authors: Shuyuan Tu, Qi Dai, Zuxuan Wu, Zhi-Qi Cheng, Han Hu, Yu-Gang Jiang

    Abstract: Contrastive language-image pretraining (CLIP) has demonstrated remarkable success in various image tasks. However, how to extend CLIP with effective temporal modeling is still an open and crucial problem. Existing factorized or joint spatial-temporal modeling trades off between the efficiency and performance. While modeling temporal information within straight through tube is widely adopted in lit… ▽ More

    Submitted 15 August, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: ICCV 2023 oral. 14 pages, 7 figures. Code released at https://github.com/Francis-Rings/ILA

  42. arXiv:2304.02173  [pdf, other

    cs.CV cs.AI cs.MM

    ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

    Authors: Zhi-Qi Cheng, Qi Dai, Siyao Li, Jingdong Sun, Teruko Mitamura, Alexander G. Hauptmann

    Abstract: Charts are a powerful tool for visually conveying complex data, but their comprehension poses a challenge due to the diverse chart types and intricate components. Existing chart comprehension methods suffer from either heuristic rules or an over-reliance on OCR systems, resulting in suboptimal performance. To address these issues, we present ChartReader, a unified framework that seamlessly integra… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  43. arXiv:2303.05033  [pdf, other

    cs.LG

    Out-of-distribution Detection with Implicit Outlier Transformation

    Authors: Qizhou Wang, Junjie Ye, Feng Liu, Quanyu Dai, Marcus Kalander, Tongliang Liu, Jianye Hao, Bo Han

    Abstract: Outlier exposure (OE) is powerful in out-of-distribution (OOD) detection, enhancing detection capability via model fine-tuning with surrogate OOD data. However, surrogate data typically deviate from test OOD data. Thus, the performance of OE, when facing unseen OOD data, can be weakened. To address this issue, we propose a novel OE-based approach that makes the model perform well for unseen OOD si… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  44. arXiv:2303.00168  [pdf, other

    cs.IR

    REASONER: An Explainable Recommendation Dataset with Multi-aspect Real User Labeled Ground Truths Towards more Measurable Explainable Recommendation

    Authors: Xu Chen, Jingsen Zhang, Lei Wang, Quanyu Dai, Zhenhua Dong, Ruiming Tang, Rui Zhang, Li Chen, Ji-Rong Wen

    Abstract: Explainable recommendation has attracted much attention from the industry and academic communities. It has shown great potential for improving the recommendation persuasiveness, informativeness and user satisfaction. Despite a lot of promising explainable recommender models have been proposed in the past few years, the evaluation strategies of these models suffer from several limitations. For exam… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

  45. arXiv:2302.10707  [pdf, other

    cs.CL

    Parallel Sentence-Level Explanation Generation for Real-World Low-Resource Scenarios

    Authors: Yan Liu, Xiaokang Chen, Qi Dai

    Abstract: In order to reveal the rationale behind model predictions, many works have exploited providing explanations in various forms. Recently, to further guarantee readability, more and more works turn to generate sentence-level human language explanations. However, current works pursuing sentence-level explanations rely heavily on annotated training data, which limits the development of interpretability… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  46. arXiv:2302.07458  [pdf, other

    cs.LG stat.ME

    CUTS: Neural Causal Discovery from Irregular Time-Series Data

    Authors: Yuxiao Cheng, Runzhao Yang, Tingxiong Xiao, Zongren Li, Jinli Suo, Kunlun He, Qionghai Dai

    Abstract: Causal discovery from time-series data has been a central task in machine learning. Recently, Granger causality inference is gaining momentum due to its good explainability and high compatibility with emerging deep neural networks. However, most existing methods assume structured input data and degenerate greatly when encountering data with randomly missing entries or non-uniform sampling frequenc… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: https://openreview.net/forum?id=UG8bQcD3Emv

    Journal ref: The Eleventh International Conference on Learning Representations, Feb. 2023

  47. arXiv:2301.10167  [pdf, other

    eess.SP cs.LG physics.optics

    EEG Opto-processor: epileptic seizure detection using diffractive photonic computing units

    Authors: Tao Yan, Maoqi Zhang, Sen Wan, Kaifeng Shang, Haiou Zhang, Xun Cao, Xing Lin, Qionghai Dai

    Abstract: Electroencephalography (EEG) analysis extracts critical information from brain signals, which has provided fundamental support for various applications, including brain-disease diagnosis and brain-computer interface. However, the real-time processing of large-scale EEG signals at high energy efficiency has placed great challenges for electronic processors on edge computing devices. Here, we propos… ▽ More

    Submitted 9 December, 2022; originally announced January 2023.

    Comments: 22 pages, 5 figures

  48. arXiv:2301.06269  [pdf, other

    cs.CV

    DarkVision: A Benchmark for Low-light Image/Video Perception

    Authors: Bo Zhang, Yuchen Guo, Runzhao Yang, Zhihong Zhang, Jiayi Xie, Jinli Suo, Qionghai Dai

    Abstract: Imaging and perception in photon-limited scenarios is necessary for various applications, e.g., night surveillance or photography, high-speed photography, and autonomous driving. In these cases, cameras suffer from low signal-to-noise ratio, which degrades the image quality severely and poses challenges for downstream high-level vision tasks like object detection and recognition. Data-driven metho… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

  49. arXiv:2301.02229  [pdf, other

    cs.CV cs.AI

    All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

    Authors: Jia Ning, Chen Li, Zheng Zhang, Zigang Geng, Qi Dai, Kun He, Han Hu

    Abstract: Unlike language tasks, where the output space is usually limited to a set of tokens, the output space of visual tasks is more complicated, making it difficult to build a unified visual model for various visual tasks. In this paper, we seek to unify the output space of visual tasks, so that we can also build a unified model for visual tasks. To this end, we demonstrate a single unified model that s… ▽ More

    Submitted 14 February, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  50. arXiv:2212.00776  [pdf, other

    cs.CV

    ResFormer: Scaling ViTs with Multi-Resolution Training

    Authors: Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang

    Abstract: Vision Transformers (ViTs) have achieved overwhelming success, yet they suffer from vulnerable resolution scalability, i.e., the performance drops drastically when presented with input resolutions that are unseen during training. We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testi… ▽ More

    Submitted 3 April, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: CVPR 2023