Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,697 results for author: Mao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20664  [pdf, other

    cs.CV

    3D-GRES: Generalized 3D Referring Expression Segmentation

    Authors: Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji

    Abstract: 3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description. However, current approaches are limited to segmenting a single target, restricting the versatility of the task. To overcome this limitation, we introduce Generalized 3D Referring Expression Segmentation (3D-GRES), which extends the capability to se… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024 (Oral), Code: https://github.com/sosppxo/3D-GRES

  2. StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset

    Authors: Chaofan Huo, Ye Shi, Yuexin Ma, Lan Xu, Jingyi Yu, Jingya Wang

    Abstract: Modeling and capturing the 3D spatial arrangement of the human and the object is the key to perceiving 3D human-object interaction from monocular images. In this work, we propose to use the Human-Object Offset between anchors which are densely sampled from the surface of human mesh and object mesh to represent human-object spatial relation. Compared with previous works which use contact map or imp… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI-23

  3. arXiv:2407.20203  [pdf, other

    cs.RO

    Privileged Reinforcement and Communication Learning for Distributed, Bandwidth-limited Multi-robot Exploration

    Authors: Yixiao Ma, Jingsong Liang, Yuhong Cao, Derek Ming Siang Tan, Guillaume Sartoretti

    Abstract: Communication bandwidth is an important consideration in multi-robot exploration, where information exchange among robots is critical. While existing methods typically aim to reduce communication throughput, they either require significant computation or significantly compromise exploration efficiency. In this work, we propose a deep reinforcement learning framework based on communication and priv… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by DARS2024

  4. arXiv:2407.20042  [pdf, other

    cs.SE

    When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention

    Authors: Lianghong Guo, Yanlin Wang, Ensheng Shi, Wanjun Zhong, Hongyu Zhang, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng

    Abstract: Code generation aims to automatically generate code snippets that meet given natural language requirements and plays an important role in software development. Although Code LLMs have shown excellent performance in this domain, their long generation time poses a signification limitation in practice use. In this paper, we first conduct an in-depth preliminary study with different Code LLMs on code… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: To appear at ISSTA 2024

  5. arXiv:2407.19523  [pdf, other

    cs.LG

    Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation

    Authors: Cheems Wang, Yiqin Lv, Yixiu Mao, Yun Qu, Yi Xu, Xiangyang Ji

    Abstract: Meta-learning is a practical learning paradigm to transfer skills across tasks from a few examples. Nevertheless, the existence of task distribution shifts tends to weaken meta-learners' generalization capability, particularly when the task distribution is naively hand-crafted or based on simple priors that fail to cover typical scenarios sufficiently. Here, we consider explicitly generative model… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: The project is available at https://sites.google.com/view/ar-metalearn

  6. arXiv:2407.19487  [pdf, other

    cs.SE

    RLCoder: Reinforcement Learning for Repository-Level Code Completion

    Authors: Yanlin Wang, Yanli Wang, Daya Guo, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng

    Abstract: Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input sequence length. However, traditional lexical-based retrieval methods like BM25 struggle to capture code semantics, while model-based retrieval methods face challeng… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: To appear at ICSE 2025

    Journal ref: 47th International Conference on Software Engineering (ICSE 2025)

  7. Alleviating Over-Smoothing via Aggregation over Compact Manifolds

    Authors: Dongzhuoran Zhou, Hui Yang, Bo Xiong, Yue Ma, Evgeny Kharlamov

    Abstract: Graph neural networks (GNNs) have achieved significant success in various applications. Most GNNs learn the node features with information aggregation of its neighbors and feature transformation in each layer. However, the node features become indistinguishable after many layers, leading to performance deterioration: a significant limitation known as over-smoothing. Past work adopted various techn… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by PAKDD 2024 as an oral presentation

  8. arXiv:2407.18955  [pdf, other

    cs.CV

    Real Face Video Animation Platform

    Authors: Xiaokai Chen, Xuan Liu, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su

    Abstract: In recent years, facial video generation models have gained popularity. However, these models often lack expressive power when dealing with exaggerated anime-style faces due to the absence of high-quality anime-style face training sets. We propose a facial animation platform that enables real-time conversion from real human faces to cartoon-style faces, supporting multiple models. Built on the Gra… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  9. arXiv:2407.18939  [pdf

    cs.CY cs.AI

    Promoting AI Competencies for Medical Students: A Scoping Review on Frameworks, Programs, and Tools

    Authors: Yingbo Ma, Yukyeong Song, Jeremy A. Balch, Yuanfang Ren, Divya Vellanki, Zhenhong Hu, Meghan Brennan, Suraj Kolla, Ziyuan Guan, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Parisa Rashidi, Tyler J. Loftus, Azra Bihorac, Benjamin Shickel

    Abstract: As more clinical workflows continue to be augmented by artificial intelligence (AI), AI literacy among physicians will become a critical requirement for ensuring safe and ethical AI-enabled patient care. Despite the evolving importance of AI in healthcare, the extent to which it has been adopted into traditional and often-overloaded medical curricula is currently unknown. In a scoping review of 1,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 25 pages, 2 figures, 3 tables

  10. arXiv:2407.18902  [pdf, other

    cs.RO cs.AI cs.LG

    Lessons from Learning to Spin "Pens"

    Authors: Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, Xiaolong Wang

    Abstract: In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Website: https://penspin.github.io/

  11. arXiv:2407.18697  [pdf, other

    quant-ph cs.ET

    Q-gen: A Parameterized Quantum Circuit Generator

    Authors: Yikai Mao, Shaswot Shresthamali, Masaaki Kondo

    Abstract: Unlike most classical algorithms that take an input and give the solution directly as an output, quantum algorithms produce a quantum circuit that works as an indirect solution to computationally hard problems. In the full quantum computing workflow, most data processing remains in the classical domain except for running the quantum circuit in the quantum processor. This leaves massive opportuniti… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 14 pages, 8 figures

  12. arXiv:2407.18097  [pdf, other

    cs.CV

    SSTD: Stripe-Like Space Target Detection using Single-Point Supervision

    Authors: Zijian Zhu, Ali Zia, Xuesong Li, Bingbing Dan, Yuebo Ma, Enhai Liu, Rujin Zhao

    Abstract: Stripe-like space target detection (SSTD) plays a key role in enhancing space situational awareness and assessing spacecraft behaviour. This domain faces three challenges: the lack of publicly available datasets, interference from stray light and stars, and the variability of stripe-like targets, which complicates pixel-level annotation. In response, we introduces `AstroStripeSet', a pioneering da… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  13. arXiv:2407.17869  [pdf, other

    cs.LG

    EllipBench: A Large-scale Benchmark for Machine-learning based Ellipsometry Modeling

    Authors: Yiming Ma, Xinjie Li, Xin Sun, Zhiyong Wang, Lionel Z. Wang

    Abstract: Ellipsometry is used to indirectly measure the optical properties and thickness of thin films. However, solving the inverse problem of ellipsometry is time-consuming since it involves human expertise to apply the data fitting techniques. Many studies use traditional machine learning-based methods to model the complex mathematical fitting process. In our work, we approach this problem from a deep l… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  14. arXiv:2407.16697  [pdf, other

    cs.CV

    AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

    Authors: Wenxuan Li, Chongyu Qu, Xiaoxi Chen, Pedro R. A. S. Bassi, Yijia Shi, Yuxiang Lai, Qian Yu, Huimin Xue, Yixiong Chen, Xiaorui Lin, Yutong Tang, Yining Cao, Haoqi Han, Zheyuan Zhang, Jiawei Liu, Tiezheng Zhang, Yujiu Ma, Jincheng Wang, Guang Zhang, Alan Yuille, Zongwei Zhou

    Abstract: We introduce the largest abdominal CT dataset (termed AbdomenAtlas) of 20,460 three-dimensional CT volumes sourced from 112 hospitals across diverse populations, geographies, and facilities. AbdomenAtlas provides 673K high-quality masks of anatomical structures in the abdominal region annotated by a team of 10 radiologists with the help of AI algorithms. We start by having expert radiologists manu… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Published in Medical Image Analysis

  15. arXiv:2407.16198  [pdf, other

    cs.CV cs.AI

    INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

    Authors: Yiwei Ma, Zhibin Wang, Xiaoshuai Sun, Weihuang Lin, Qiang Zhou, Jiayi Ji, Rongrong Ji

    Abstract: With advancements in data availability and computing resources, Multimodal Large Language Models (MLLMs) have showcased capabilities across various fields. However, the quadratic complexity of the vision encoder in MLLMs constrains the resolution of input images. Most current approaches mitigate this issue by cropping high-resolution images into smaller sub-images, which are then processed indepen… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  16. arXiv:2407.16197  [pdf, other

    cs.CV cs.RO

    LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera

    Authors: Yukai Ma, Jianbiao Mei, Xuemeng Yang, Licheng Wen, Weihua Xu, Jiangning Zhang, Botian Shi, Yong Liu, Xingxing Zuo

    Abstract: Semantic Scene Completion (SSC) is pivotal in autonomous driving perception, frequently confronted with the complexities of weather and illumination changes. The long-term strategy involves fusing multi-modal information to bolster the system's robustness. Radar, increasingly utilized for 3D target detection, is gradually replacing LiDAR in autonomous driving applications, offering a robust sensin… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  17. arXiv:2407.14890  [pdf, ps, other

    cs.IT

    Rate Splitting Multiple Access: Optimal Beamforming Structure and Efficient Optimization Algorithms

    Authors: Tianyu Fang, Yijie Mao

    Abstract: Joint optimization for common rate allocation and beamforming design have been widely studied in rate splitting multiple access (RSMA) empowered multiuser multi-antenna transmission networks. Due to the highly coupled optimization variables and non-convexity of the joint optimization problems, emerging algorithms such as weighted minimum mean square error (WMMSE) and successive convex approximatio… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 15 pages, 10 figures

  18. arXiv:2407.13887  [pdf, other

    cs.CL

    Learning Goal-Conditioned Representations for Language Reward Models

    Authors: Vaskar Nath, Dylan Slack, Jeff Da, Yuntao Ma, Hugh Zhang, Spencer Whitehead, Sean Hendryx

    Abstract: Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning (RL). Nevertheless, it is unclear how improved representation learning can benefit reinforcement learning from human feedback (RLHF) on language models (LMs). In this work, we propose training reward models (RMs) in a contrastive,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  19. arXiv:2407.13703  [pdf, other

    cs.IT cs.LG eess.SP

    Energy-Efficient Channel Decoding for Wireless Federated Learning: Convergence Analysis and Adaptive Design

    Authors: Linping Qu, Yuyi Mao, Shenghui Song, Chi-Ying Tsui

    Abstract: One of the most critical challenges for deploying distributed learning solutions, such as federated learning (FL), in wireless networks is the limited battery capacity of mobile clients. While it is a common belief that the major energy consumption of mobile clients comes from the uplink data transmission, this paper presents a novel finding, namely the channel decoding operation also contributes… ▽ More

    Submitted 19 July, 2024; v1 submitted 26 June, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE TWC for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  20. arXiv:2407.13647  [pdf, other

    cs.CL cs.AI

    Weak-to-Strong Reasoning

    Authors: Yuqing Yang, Yan Ma, Pengfei Liu

    Abstract: When large language models (LLMs) exceed human-level capabilities, it becomes increasingly challenging to provide full-scale and accurate supervisions for these models. Weak-to-strong learning, which leverages a less capable model to unlock the latent abilities of a stronger model, proves valuable in this context. Yet, the efficacy of this approach for complex reasoning tasks is still untested. Fu… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  21. arXiv:2407.13168  [pdf, other

    cs.AI cs.CL

    SciCode: A Research Coding Benchmark Curated by Scientists

    Authors: Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, Shengyan Liu, Di Luo, Yutao Ma, Hao Tong, Kha Trinh, Chenyu Tian, Zihan Wang, Bohao Wu, Yanyu Xiong, Shengzhu Yin, Minhui Zhu, Kilian Lieret, Yanxin Lu, Genglin Liu, Yufeng Du , et al. (5 additional authors not shown)

    Abstract: Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 25 pages, 9 figures, 7 tables

  22. arXiv:2407.12504  [pdf, other

    cs.CL

    Case2Code: Learning Inductive Reasoning with Synthetic Data

    Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

    Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  23. arXiv:2407.12274  [pdf, other

    cs.CV

    MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics

    Authors: Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu, Yongwei Li

    Abstract: Deception detection has garnered increasing attention in recent years due to the significant growth of digital media and heightened ethical and security concerns. It has been extensively studied using multimodal methods, including video, audio, and text. In addition, individual differences in deception production and detection are believed to play a crucial role.Although some studies have utilized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Code and data are available; Submitted to NeurIPS 2024 Datasets and Benchmarks Track

  24. arXiv:2407.12074  [pdf, other

    cs.LG cs.AI

    Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

    Authors: Yuzhu Mao, Siqi Ping, Zihao Zhao, Yang Liu, Wenbo Ding

    Abstract: Large pre-trained models, such as large language models (LLMs), present significant resource challenges for fine-tuning due to their extensive parameter sizes, especially for applications in mobile systems. To address this, Low-Rank Adaptation (LoRA) has been developed to reduce resource consumption while maintaining satisfactory fine-tuning results. Despite its effectiveness, the original LoRA me… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  25. arXiv:2407.11712  [pdf, other

    cs.IR

    Harnessing Large Language Models for Multimodal Product Bundling

    Authors: Xiaohao Liu, Jie Wu, Zhulin Tao, Yunshan Ma, Yinwei Wei, Tat-seng Chua

    Abstract: Product bundling provides clients with a strategic combination of individual items. And it has gained significant attention in recent years as a fundamental prerequisite for online services. Recent methods utilize multimodal information through sophisticated extractors for bundling, but remain limited by inferior semantic understanding, the restricted scope of knowledge, and an inability to handle… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: under review

  26. arXiv:2407.11638  [pdf, other

    cs.CL cs.IR

    A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting

    Authors: He Chang, Chenchen Ye, Zhulin Tao, Jie Wu, Zhengmao Yang, Yunshan Ma, Xianglin Huang, Tat-Seng Chua

    Abstract: Recently, Large Language Models (LLMs) have demonstrated great potential in various data mining tasks, such as knowledge question answering, mathematical reasoning, and commonsense reasoning. However, the reasoning capability of LLMs on temporal event forecasting has been under-explored. To systematically investigate their abilities in temporal event forecasting, we conduct a comprehensive evaluat… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  27. arXiv:2407.11046  [pdf, other

    cs.LG cs.AI cs.CL

    A Survey on LoRA of Large Language Models

    Authors: Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao

    Abstract: Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  28. arXiv:2407.10956  [pdf, other

    cs.AI cs.CL

    Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

    Authors: Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu

    Abstract: Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivit… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 34 pages, 14 figures, 10 tables

  29. arXiv:2407.10084  [pdf, other

    cs.CV

    Part2Object: Hierarchical Unsupervised 3D Instance Segmentation

    Authors: Cheng Shi, Yulin Zhang, Bin Yang, Jiajin Tang, Yuexin Ma, Sibei Yang

    Abstract: Unsupervised 3D instance segmentation aims to segment objects from a 3D point cloud without any annotations. Existing methods face the challenge of either too loose or too tight clustering, leading to under-segmentation or over-segmentation. To address this issue, we propose Part2Object, hierarchical clustering with object guidance. Part2Object employs multi-layer clustering from points to object… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accept to ECCV2024

  30. arXiv:2407.09833  [pdf, other

    cs.CV

    LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment

    Authors: Yiming Ren, Xiao Han, Yichen Yao, Xiaoxiao Long, Yujing Sun, Yuexin Ma

    Abstract: LiDAR-based human motion capture has garnered significant interest in recent years for its practicability in large-scale and unconstrained environments. However, most methods rely on cleanly segmented human point clouds as input, the accuracy and smoothness of their motion results are compromised when faced with noisy data, rendering them unsuitable for practical applications. To address these lim… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  31. arXiv:2407.09751  [pdf, other

    cs.CV

    TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

    Authors: Xiaopei Wu, Yuenan Hou, Xiaoshui Huang, Binbin Lin, Tong He, Xinge Zhu, Yuexin Ma, Boxi Wu, Haifeng Liu, Deng Cai, Wanli Ouyang

    Abstract: Training deep models for LiDAR semantic segmentation is challenging due to the inherent sparsity of point clouds. Utilizing temporal data is a natural remedy against the sparsity problem as it makes the input signal denser. However, previous multi-frame fusion algorithms fall short in utilizing sufficient temporal information due to the memory constraint, and they also ignore the informative tempo… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by CVPR 2024

  32. arXiv:2407.09367  [pdf, other

    cs.CV

    Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation

    Authors: Zhilin Zhu, Xiaopeng Hong, Zhiheng Ma, Weijun Zhuang, Yaohui Ma, Yong Dai, Yaowei Wang

    Abstract: Continual Test-Time Adaptation (CTTA) involves adapting a pre-trained source model to continually changing unsupervised target domains. In this paper, we systematically analyze the challenges of this task: online environment, unsupervised nature, and the risks of error accumulation and catastrophic forgetting under continual domain shifts. To address these challenges, we reshape the online data bu… ▽ More

    Submitted 18 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: This is the preprint version of our paper and supplemental material to appear in ECCV 2024

  33. arXiv:2407.08949  [pdf, other

    cs.CV

    One-Shot Pose-Driving Face Animation Platform

    Authors: He Feng, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su

    Abstract: The objective of face animation is to generate dynamic and expressive talking head videos from a single reference face, utilizing driving conditions derived from either video or audio inputs. Current approaches often require fine-tuning for specific identities and frequently fail to produce expressive videos due to the limited effectiveness of Wav2Pose modules. To facilitate the generation of one-… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  34. arXiv:2407.08948  [pdf, other

    eess.IV cs.CV

    Symmetry Awareness Encoded Deep Learning Framework for Brain Imaging Analysis

    Authors: Yang Ma, Dongang Wang, Peilin Liu, Lynette Masters, Michael Barnett, Weidong Cai, Chenyu Wang

    Abstract: The heterogeneity of neurological conditions, ranging from structural anomalies to functional impairments, presents a significant challenge in medical imaging analysis tasks. Moreover, the limited availability of well-annotated datasets constrains the development of robust analysis models. Against this backdrop, this study introduces a novel approach leveraging the inherent anatomical symmetrical… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

    ACM Class: I.2.10; I.4.10

  35. arXiv:2407.08923  [pdf, other

    cs.IT cs.NI

    A Bistatic ISAC Framework for LEO Satellite Systems: A Rate-Splitting Approach

    Authors: Juha Park, Jaehyup Seong, Jaehak Ryu, Yijie Mao, Wonjae Shin

    Abstract: Aiming to achieve ubiquitous global connectivity and target detection on the same platform with improved spectral/energy efficiency and reduced onboard hardware cost, low Earth orbit (LEO) satellite systems capable of simultaneously performing communications and radar have attracted significant attention. Designing such a joint system should address not only the challenges of integrating two funct… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 33 pages, 8 figures, 2 tables

  36. arXiv:2407.08126  [pdf, other

    cs.AI cs.CV cs.MM

    Label-anticipated Event Disentanglement for Audio-Visual Video Parsing

    Authors: Jinxing Zhou, Dan Guo, Yuxin Mao, Yiran Zhong, Xiaojun Chang, Meng Wang

    Abstract: Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate events within audio and visual modalities. Multiple events can overlap in the timeline, making identification challenging. While traditional methods usually focus on improving the early audio-visual encoders to embed more effective features, the decoding phase -- crucial for final event classification, often receives less… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  37. arXiv:2407.07436  [pdf, other

    cs.IT math.OC

    Alternating Subspace Approximate Message Passing

    Authors: Xu Zhu, Yufei Ma, Xiaoguang Li, Tiejun Li

    Abstract: Numerous renowned algorithms for tackling the compressed sensing problem employ an alternating strategy, which typically involves data matching in one module and denoising in another. Based on an in-depth analysis of the connection between the message passing and operator splitting, we present a novel approach, the Alternating Subspace Method (ASM), which intuitively combines the principles of the… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 19 pages, 6 figures

    MSC Class: 94A12; 65F10; 90C06

  38. arXiv:2407.07053  [pdf, other

    cs.CV

    Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

    Authors: Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang

    Abstract: Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In lig… ▽ More

    Submitted 23 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: code: https://github.com/zwq2018/Multi-modal-Self-instruct dataset: https://huggingface.co/datasets/zwq2018/Multi-modal-Self-instruct Leaderboard: https://multi-modal-self-instruct.github.io/

  39. arXiv:2407.06135  [pdf, other

    cs.CL cs.AI cs.CV

    ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

    Authors: Ethan Chern, Jiadi Su, Yan Ma, Pengfei Liu

    Abstract: Previous open-source large multimodal models (LMMs) have faced several limitations: (1) they often lack native integration, requiring adapters to align visual representations with pre-trained large language models (LLMs); (2) many are restricted to single-modal generation; (3) while some support multimodal generation, they rely on separate diffusion models for visual modeling and generation. To mi… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  40. arXiv:2407.05553  [pdf, other

    cs.CV

    A Color Image Analysis Tool to Help Users Choose a Makeup Foundation Color

    Authors: Yafei Mao, Christopher Merkle, Jan P. Allebach

    Abstract: This paper presents an approach to predict the color of skin-with-foundation based on a no makeup selfie image and a foundation shade image. Our approach first calibrates the image with the help of the color checker target, and then trains a supervised-learning model to predict the skin color. In the calibration stage, We propose to use three different transformation matrices to map the device dep… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Color Imaging: Displaying, Processing, Hardcopy, and Applications, 373:1-373:6

  41. arXiv:2407.05363  [pdf, other

    cs.CV

    Multi-branch Collaborative Learning Network for 3D Visual Grounding

    Authors: Zhipeng Qian, Yiwei Ma, Zhekai Lin, Jiayi Ji, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji

    Abstract: 3D referring expression comprehension (3DREC) and segmentation (3DRES) have overlapping objectives, indicating their potential for collaboration. However, existing collaborative approaches predominantly depend on the results of one task to make predictions for the other, limiting effective collaboration. We argue that employing separate branches for 3DREC and 3DRES tasks enhances the model's capac… ▽ More

    Submitted 10 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  42. arXiv:2407.05352  [pdf, other

    cs.CV cs.MM

    Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model

    Authors: Danni Yang, Ruohan Dong, Jiayi Ji, Yiwei Ma, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

    Abstract: Recently, diffusion models have increasingly demonstrated their capabilities in vision understanding. By leveraging prompt-based learning to construct sentences, these models have shown proficiency in classification and visual grounding tasks. However, existing approaches primarily showcase their ability to perform sentence-level localization, leaving the potential for leveraging contextual inform… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  43. arXiv:2407.05155  [pdf, other

    cs.IT eess.SP

    Wi-Fi Beyond Communications: Experimental Evaluation of Respiration Monitoring and Motion Detection Using COTS Devices

    Authors: Jiuyu Liu, Yi Ma, Rahim Tafazolli

    Abstract: Wi-Fi sensing has become an attractive option for non-invasive monitoring of human activities and vital signs. This paper explores the feasibility of using state-of-the-art commercial off-the-shelf (COTS) devices for Wi-Fi sensing applications, particularly respiration monitoring and motion detection. We utilize the Intel AX210 network interface card (NIC) to transmit Wi-Fi signals in both 2.4 GHz… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: This work has been accepted by IEEE ICCC Workshop 2024. Copyright may be transferred without notice, after which this version may no longer be accessible

  44. arXiv:2407.04923  [pdf, other

    cs.CV cs.CL

    OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

    Authors: Tiancheng Zhao, Qianqian Zhang, Kyusong Lee, Peng Liu, Lu Zhang, Chunxin Fang, Jiajia Liao, Kelei Jiang, Yibo Ma, Ruochen Xu

    Abstract: We introduce OmChat, a model designed to excel in handling long contexts and video understanding tasks. OmChat's new architecture standardizes how different visual inputs are processed, making it more efficient and adaptable. It uses a dynamic vision encoding process to effectively handle images of various resolutions, capturing fine details across a range of image qualities. OmChat utilizes an ac… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 14 pages

  45. arXiv:2407.04418  [pdf, other

    cs.HC cs.AI cs.LG

    Enabling On-Device LLMs Personalization with Smartphone Sensing

    Authors: Shiquan Zhang, Ying Ma, Le Fang, Hong Jia, Simon D'Alfonso, Vassilis Kostakos

    Abstract: This demo presents a novel end-to-end framework that combines on-device large language models (LLMs) with smartphone sensing technologies to achieve context-aware and personalized services. The framework addresses critical limitations of current personalization solutions via cloud LLMs, such as privacy concerns, latency and cost, and limited personal information. To achieve this, we innovatively p… ▽ More

    Submitted 23 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages, 3 figures, conference demo paper

  46. arXiv:2407.03061  [pdf, other

    cs.CL

    ALTER: Augmentation for Large-Table-Based Reasoning

    Authors: Han Zhang, Yuheng Ma, Hanfang Yang

    Abstract: While extensive research has explored the use of large language models (LLMs) for table-based reasoning, most approaches struggle with scalability when applied to large tables. To maintain the superior comprehension abilities of LLMs in these scenarios, we introduce ALTER(Augmentation for Large-Table-Based Reasoning)-a framework designed to harness the latent augmentation potential in both free-fo… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  47. arXiv:2407.02034  [pdf, other

    cs.CV

    TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation

    Authors: Chaofan Luo, Donglin Di, Yongjia Ma, Zhou Xue, Chen Wei, Xun Yang, Yebin Liu

    Abstract: Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency in multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  48. arXiv:2407.01523  [pdf, other

    cs.CV cs.CL

    MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

    Authors: Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun

    Abstract: Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co… ▽ More

    Submitted 10 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  49. arXiv:2407.00737  [pdf, other

    cs.CV

    LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

    Authors: Mushui Liu, Yuhang Ma, Xinfeng Zhang, Yang Zhen, Zeng Zhao, Zhipeng Hu, Bai Liu, Changjie Fan

    Abstract: Diffusion Models have exhibited substantial success in text-to-image generation. However, they often encounter challenges when dealing with complex and dense prompts that involve multiple objects, attribute binding, and long descriptions. This paper proposes a framework called \textbf{LLM4GEN}, which enhances the semantic understanding ability of text-to-image diffusion models by leveraging the se… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 11 pages, 13 figures

  50. arXiv:2407.00610  [pdf, other

    cs.LG

    Diff-BBO: Diffusion-Based Inverse Modeling for Black-Box Optimization

    Authors: Dongxia Wu, Nikki Lijing Kuang, Ruijia Niu, Yi-An Ma, Rose Yu

    Abstract: Black-box optimization (BBO) aims to optimize an objective function by iteratively querying a black-box oracle. This process demands sample-efficient optimization due to the high computational cost of function evaluations. While prior studies focus on forward approaches to learn surrogates for the unknown objective function, they struggle with high-dimensional inputs where valid inputs form a smal… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.