Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 535 results for author: Tang, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03545  [pdf, ps, other

    cs.LG cs.DS

    The Power of Second Chance: Personalized Submodular Maximization with Two Candidates

    Authors: Jing Yuan, Shaojie Tang

    Abstract: Most of existing studies on submodular maximization focus on selecting a subset of items that maximizes a \emph{single} submodular function. However, in many real-world scenarios, we might have multiple user-specific functions, each of which models the utility of a particular type of user. In these settings, our goal would be to choose a set of items that performs well across all the user-specific… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2408.12468  [pdf, other

    cs.DM cs.DS

    A Constant-Approximation Algorithm for Budgeted Sweep Coverage with Mobile Sensors

    Authors: Wei Liang, Shaojie Tang, Zhao Zhang

    Abstract: In this paper, we present the first constant-approximation algorithm for {\em budgeted sweep coverage problem} (BSC). The BSC involves designing routes for a number of mobile sensors (a.k.a. robots) to periodically collect information as much as possible from points of interest (PoIs). To approach this problem, we propose to first examine the {\em multi-orienteering problem} (MOP). The MOP aims to… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  3. arXiv:2408.11381  [pdf, other

    cs.CL

    RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

    Authors: Xuanwang Zhang, Yunze Song, Yidong Wang, Shuyun Tang, Xinfeng Li, Zhengran Zeng, Zhen Wu, Wei Ye, Wenyuan Xu, Yue Zhang, Xinyu Dai, Shikun Zhang, Qingsong Wen

    Abstract: Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG). However, two key issu… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 6 pages, 3 figures

  4. arXiv:2408.10600  [pdf

    cs.CV cs.AI

    Breast tumor classification based on self-supervised contrastive learning from ultrasound videos

    Authors: Yunxin Tang, Siyuan Tang, Jian Zhang, Hao Chen

    Abstract: Background: Breast ultrasound is prominently used in diagnosing breast tumors. At present, many automatic systems based on deep learning have been developed to help radiologists in diagnosis. However, training such systems remains challenging because they are usually data-hungry and demand amounts of labeled data, which need professional knowledge and are expensive. Methods: We adopted a triplet n… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  5. arXiv:2408.10592  [pdf, other

    cs.AI cs.CG cs.LO

    Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams

    Authors: Litian Huang, Xinguo Yu, Feng Xiong, Bin He, Shengbing Tang, Jiawen Fu

    Abstract: Solving Algebra Problems with Geometry Diagrams (APGDs) is still a challenging problem because diagram processing is not studied as intensively as language processing. To work against this challenge, this paper proposes a hologram reasoning scheme and develops a high-performance method for solving APGDs by using this scheme. To reach this goal, it first defines a hologram, being a kind of graph, a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  6. arXiv:2408.09856  [pdf, other

    cs.CL cs.AI

    TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition

    Authors: Tianwei Lin, Jiang Liu, Wenqiao Zhang, Zhaocheng Li, Yang Dai, Haoyuan Li, Zhelun Yu, Wanggui He, Juncheng Li, Hao Jiang, Siliang Tang, Yueting Zhuang

    Abstract: While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multidimensional task scenarios. To address this issue, one straightforward solution is to introduce task-specific LoRA modules as domain experts, leveraging the modeling of multiple experts' capabilities and thus en… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  7. arXiv:2408.09350  [pdf, other

    cs.LG cs.AI

    E-CGL: An Efficient Continual Graph Learner

    Authors: Jianhao Guo, Zixuan Ni, Yun Zhu, Siliang Tang

    Abstract: Continual learning has emerged as a crucial paradigm for learning from sequential data while preserving previous knowledge. In the realm of continual graph learning, where graphs continuously evolve based on streaming graph data, continual graph learning presents unique challenges that require adaptive and efficient graph learning methods in addition to the problem of catastrophic forgetting. The… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  8. arXiv:2408.08921  [pdf, other

    cs.AI cs.CL cs.IR

    Graph Retrieval-Augmented Generation: A Survey

    Authors: Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, Siliang Tang

    Abstract: Recently, Retrieval-Augmented Generation (RAG) has achieved remarkable success in addressing the challenges of Large Language Models (LLMs) without necessitating retraining. By referencing an external knowledge base, RAG refines LLM outputs, effectively mitigating issues such as ``hallucination'', lack of domain-specific knowledge, and outdated information. However, the complex structure of relati… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Ongoing work

  9. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  10. AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

    Authors: Mingcan Xiang, Steven Jiaxun Tang, Qizheng Yang, Hui Guan, Tongping Liu

    Abstract: In the domain of multimedia and multimodal processing, the efficient handling of diverse data streams such as images, video, and sensor data is paramount. Model compression and multitask learning (MTL) are crucial in this field, offering the potential to address the resource-intensive demands of processing and interpreting multiple forms of media simultaneously. However, effectively compressing a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 13 pages, 9 figures, Published at ACM Multimedia (ACM MM) 2024

  11. arXiv:2407.20229  [pdf, other

    cs.CV

    Improving 2D Feature Representations by 3D-Aware Fine-Tuning

    Authors: Yuanwen Yue, Anurag Das, Francis Engelmann, Siyu Tang, Jan Eric Lenssen

    Abstract: Current visual foundation models are trained purely on unstructured 2D data, limiting their understanding of 3D structure of objects and scenes. In this work, we show that fine-tuning on 3D-aware data improves the quality of emerging semantic features. We design a method to lift semantic 2D features into an efficient 3D Gaussian representation, which allows us to re-render them for arbitrary views… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project page: https://ywyue.github.io/FiT3D

  12. arXiv:2407.19542  [pdf, other

    cs.CV

    UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation

    Authors: Shuang Wu, Songlin Tang, Guangming Lu, Jianzhuang Liu, Wenjie Pei

    Abstract: Typical inverse rendering methods focus on learning implicit neural scene representations by modeling the geometry, materials and illumination separately, which entails significant computations for optimization. In this work we design a Unified Voxelization framework for explicit learning of scene representations, dubbed UniVoxel, which allows for efficient modeling of the geometry, materials and… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  13. arXiv:2407.19405  [pdf, other

    cs.AI

    Logic Distillation: Learning from Code Function by Function for Planning and Decision-making

    Authors: Dong Chen, Shilin Zhang, Fei Gao, Yueting Zhuang, Siliang Tang, Qidong Liu, Mingliang Xu

    Abstract: Large language models (LLMs) have garnered increasing attention owing to their powerful logical reasoning capabilities. Generally, larger LLMs (L-LLMs) that require paid interfaces exhibit significantly superior performance compared to smaller LLMs (S-LLMs) that can be deployed on a variety of devices. Knowledge distillation (KD) aims to empower S-LLMs with the capabilities of L-LLMs, while S-LLMs… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures

  14. DualFed: Enjoying both Generalization and Personalization in Federated Learning via Hierachical Representations

    Authors: Guogang Zhu, Xuefeng Liu, Jianwei Niu, Shaojie Tang, Xinghao Wu, Jiayuan Zhang

    Abstract: In personalized federated learning (PFL), it is widely recognized that achieving both high model generalization and effective personalization poses a significant challenge due to their conflicting nature. As a result, existing PFL methods can only manage a trade-off between these two objectives. This raises an interesting question: Is it feasible to develop a model capable of achieving both object… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MutltiMedia 2024

  15. arXiv:2407.16139  [pdf, other

    cs.LG

    Tackling Feature-Classifier Mismatch in Federated Learning via Prompt-Driven Feature Transformation

    Authors: Xinghao Wu, Jianwei Niu, Xuefeng Liu, Mingjia Shi, Guogang Zhu, Shaojie Tang

    Abstract: In traditional Federated Learning approaches like FedAvg, the global model underperforms when faced with data heterogeneity. Personalized Federated Learning (PFL) enables clients to train personalized models to fit their local data distribution better. However, we surprisingly find that the feature extractor in FedAvg is superior to those in most PFL methods. More interestingly, by applying a line… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 23 pages, 9 figures

  16. arXiv:2407.15464  [pdf, other

    cs.LG cs.DC

    The Diversity Bonus: Learning from Dissimilar Distributed Clients in Personalized Federated Learning

    Authors: Xinghao Wu, Xuefeng Liu, Jianwei Niu, Guogang Zhu, Shaojie Tang, Xiaotian Li, Jiannong Cao

    Abstract: Personalized Federated Learning (PFL) is a commonly used framework that allows clients to collaboratively train their personalized models. PFL is particularly useful for handling situations where data from different clients are not independent and identically distributed (non-IID). Previous research in PFL implicitly assumes that clients can gain more benefits from those with similar data distribu… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 14 pages, 9 figures

  17. arXiv:2407.10486  [pdf, other

    cs.AI cs.CL

    IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization

    Authors: Jie Cao, Dian Jiao, Qiang Yan, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

    Abstract: Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. With the advent of large language models (LLMs), shows their impressive capability of textual understanding through large-scale pretraining, which implies the great potential of extractive snippet generation. In this paper, we systematically i… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  18. arXiv:2407.09679  [pdf, other

    cs.CV cs.GR cs.LG

    Physics-Informed Learning of Characteristic Trajectories for Smoke Reconstruction

    Authors: Yiming Wang, Siyu Tang, Mengyu Chu

    Abstract: We delve into the physics-informed neural reconstruction of smoke and obstacles through sparse-view RGB videos, tackling challenges arising from limited observation of complex dynamics. Existing physics-informed neural networks often emphasize short-term physics constraints, leaving the proper preservation of long-term conservation less explored. We introduce Neural Characteristic Trajectory Field… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH 2024 (conference track), Project Website: \url{https://19reborn.github.io/PICT_Smoke.github.io/}

  19. arXiv:2407.08554  [pdf, other

    cs.AI cs.HC

    Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

    Authors: Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

    Abstract: A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl… ▽ More

    Submitted 28 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 24 pages

  20. arXiv:2407.07174  [pdf, other

    cs.CV

    CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

    Authors: Xiaoding Yuan, Shitao Tang, Kejie Li, Alan Yuille, Peng Wang

    Abstract: This paper introduces Camera-free Diffusion (CamFreeDiff) model for 360-degree image outpainting from a single camera-free image and text description. This method distinguishes itself from existing strategies, such as MVDiffusion, by eliminating the requirement for predefined camera poses. Instead, our model incorporates a mechanism for predicting homography directly within the multi-view diffusio… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  21. arXiv:2407.06800  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Learn and Don't Forget: Adding a New Language to ASR Foundation Models

    Authors: Mengjie Qian, Siyuan Tang, Rao Ma, Kate M. Knill, Mark J. F. Gales

    Abstract: Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language set. Fine-tuning, while simple, may degrade the accuracy of the original set. We compare three approaches that exploit adaptation parameters: soft language code… ▽ More

    Submitted 19 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  22. Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference

    Authors: Kai Shen, Lingfei Wu, Siliang Tang, Fangli Xu, Bo Long, Yueting Zhuang, Jian Pei

    Abstract: The visual question generation (VQG) task aims to generate human-like questions from an image and potentially other side information (e.g. answer type). Previous works on VQG fall in two aspects: i) They suffer from one image to many questions mapping problem, which leads to the failure of generating referential and meaningful questions from an image. ii) They fail to model complex implicit relati… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 2024

  23. arXiv:2407.04699  [pdf, other

    cs.CV cs.AI

    LaRa: Efficient Large-Baseline Radiance Fields

    Authors: Anpei Chen, Haofei Xu, Stefano Esposito, Siyu Tang, Andreas Geiger

    Abstract: Radiance field methods have achieved photorealistic novel view synthesis and geometry reconstruction. But they are mostly applied in per-scene optimization or small-baseline settings. While several recent works investigate feed-forward reconstruction with large baselines by utilizing transformers, they all operate with a standard global attention mechanism and hence ignore the local nature of 3D r… ▽ More

    Submitted 15 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: Project Page: https://apchenstu.github.io/LaRa/

  24. arXiv:2407.04490  [pdf, other

    cs.CV

    Micro-gesture Online Recognition using Learnable Query Points

    Authors: Pengyu Liu, Fei Wang, Kun Li, Guoliang Chen, Yanyan Wei, Shengeng Tang, Zhiliang Wu, Dan Guo

    Abstract: In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the Micro-gesture Online Recognition track in the MiGA challenge at IJCAI 2024. The Micro-gesture Online Recognition task involves identifying the category and locating the start and end times of micro-gestures in video clips. Compared to the typical Temporal Action Detection task, the Micro-gesture Online Recogn… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Technical Report of HFUT-VUT for the MiGA challenge at IJCAI 2024

  25. arXiv:2407.02662  [pdf, other

    cs.SI cs.CL cs.CY

    Supporters and Skeptics: LLM-based Analysis of Engagement with Mental Health (Mis)Information Content on Video-sharing Platforms

    Authors: Viet Cuong Nguyen, Mini Jain, Abhijat Chauhan, Heather Jaime Soled, Santiago Alvarez Lesmes, Zihang Li, Michael L. Birnbaum, Sunny X. Tang, Srijan Kumar, Munmun De Choudhury

    Abstract: Over one in five adults in the US lives with a mental illness. In the face of a shortage of mental health professionals and offline resources, online short-form video content has grown to serve as a crucial conduit for disseminating mental health help and resources. However, the ease of content creation and access also contributes to the spread of misinformation, posing risks to accurate diagnosis… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 12 pages, in submission to ICWSM

  26. arXiv:2407.01967  [pdf, other

    cs.CV

    Unleash the Power of Local Representations for Few-Shot Classification

    Authors: Shi Tang, Guiming Luo, Xinchen Ye, Zhiyi Xia

    Abstract: Generalizing to novel classes unseen during training is a key challenge of few-shot classification. Recent metric-based methods try to address this by local representations. However, they are unable to take full advantage of them due to (i) improper supervision for pretraining the feature extractor, and (ii) lack of adaptability in the metric for handling various possible compositions of local fea… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  27. arXiv:2407.01130  [pdf, other

    cs.CL

    Cross-Lingual Transfer Learning for Speech Translation

    Authors: Rao Ma, Yassir Fathullah, Mengjie Qian, Siyuan Tang, Mark Gales, Kate Knill

    Abstract: There has been increasing interest in building multilingual foundation models for NLP and speech research. Zero-shot cross-lingual transfer has been demonstrated on a range of NLP tasks where a model fine-tuned on task-specific data in one language yields performance gains in other languages. Here, we explore whether speech-based models exhibit the same transfer capability. Using Whisper as an exa… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  28. arXiv:2406.19931  [pdf, other

    cs.LG cs.AI

    Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-Rank Decomposition

    Authors: Xinghao Wu, Xuefeng Liu, Jianwei Niu, Haolin Wang, Shaojie Tang, Guogang Zhu, Hao Su

    Abstract: To address data heterogeneity, the key strategy of Personalized Federated Learning (PFL) is to decouple general knowledge (shared among clients) and client-specific knowledge, as the latter can have a negative impact on collaboration if not removed. Existing PFL methods primarily adopt a parameter partitioning approach, where the parameters of a model are designated as one of two types: parameters… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 12 pages, 8 figures

  29. arXiv:2406.18846  [pdf, other

    cs.CE

    AFBench: A Large-scale Benchmark for Airfoil Design

    Authors: Jian Liu, Jianyu Wu, Hairun Xie, Guoqing Zhang, Jing Wang, Wei Liu, Wanli Ouyang, Junjun Jiang, Xianming Liu, Shixiang Tang, Miao Zhang

    Abstract: Data-driven generative models have emerged as promising approaches towards achieving efficient mechanical inverse design. However, due to prohibitively high cost in time and money, there is still lack of open-source and large-scale benchmarks in this field. It is mainly the case for airfoil inverse design, which requires to generate and edit diverse geometric-qualified and aerodynamic-qualified ai… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Submitted to NeurIPS 2024 Dataset & Benchmark Track

  30. arXiv:2406.18074  [pdf, other

    cs.CV cs.AI

    Few-Shot Medical Image Segmentation with High-Fidelity Prototypes

    Authors: Song Tang, Shaxu Yan, Xiaozhi Qi, Jianxin Gao, Mao Ye, Jianwei Zhang, Xiatian Zhu

    Abstract: Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labelled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal fo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  31. arXiv:2406.15471  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Large Models with Small models: Lower Costs and Better Performance

    Authors: Dong Chen, Shuo Zhang, Yueting Zhuang, Siliang Tang, Qidong Liu, Hua Wang, Mingliang Xu

    Abstract: Pretrained large models (PLMs), such as ChatGPT, have demonstrated remarkable performance across diverse tasks. However, the significant computational requirements of PLMs have discouraged most product teams from running or fine-tuning them. In such cases, to harness the exceptional performance of PLMs, one must rely on expensive APIs, thereby exacerbating the economic burden. Despite the overall… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 11 pages

  32. arXiv:2406.15122  [pdf, ps, other

    cs.IT

    Convolutional dynamical sampling and some new results

    Authors: Longxiu Huang, A. Martina Neuman, Sui Tang, Yuying Xie

    Abstract: In this work, we explore the dynamical sampling problem on $\ell^2(\mathbb{Z})$ driven by a convolution operator defined by a convolution kernel. This problem is inspired by the need to recover a bandlimited heat diffusion field from space-time samples and its discrete analogue. In this book chapter, we review recent results in the finite-dimensional case and extend these findings to the infinite-… ▽ More

    Submitted 4 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  33. arXiv:2406.13586  [pdf, ps, other

    cs.GT cs.AI

    Submodular Participatory Budgeting

    Authors: Jing Yuan, Shaojie Tang

    Abstract: Participatory budgeting refers to the practice of allocating public resources by collecting and aggregating individual preferences. Most existing studies in this field often assume an additive utility function, where each individual holds a private utility for each candidate project, and the total utility of a set of funded projects is simply the sum of the utilities of all projects. We argue that… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  34. arXiv:2406.12608  [pdf, other

    cs.CL cs.AI

    Bridging Local Details and Global Context in Text-Attributed Graphs

    Authors: Yaoke Wang, Yun Zhu, Wenqiao Zhang, Yueting Zhuang, Yunfei Li, Siliang Tang

    Abstract: Representation learning on text-attributed graphs (TAGs) is vital for real-world applications, as they combine semantic textual and contextual structural information. Research in this field generally consist of two main perspectives: local-level encoding and global-level aggregating, respectively refer to textual node information unification (e.g., using Language Models) and structure-augmented mo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  35. arXiv:2406.11253  [pdf, other

    cs.CV

    Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space

    Authors: Yuan Wang, Zhao Wang, Junhao Gong, Di Huang, Tong He, Wanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang, Dan Xu

    Abstract: In this paper, we introduce a novel path to $\textit{general}$ human motion generation by focusing on 2D space. Traditional methods have primarily generated human motions in 3D, which, while detailed and realistic, are often limited by the scope of available 3D motion data in terms of both the size and the diversity. To address these limitations, we exploit extensive availability of 2D motion data… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 22 pages, 11figures, 17 tables

  36. Contextual Distillation Model for Diversified Recommendation

    Authors: Fan Li, Xu Si, Shisong Tang, Dingmin Wang, Kunyan Han, Bing Han, Guorui Zhou, Yang Song, Hechang Chen

    Abstract: The diversity of recommendation is equally crucial as accuracy in improving user experience. Existing studies, e.g., Determinantal Point Process (DPP) and Maximal Marginal Relevance (MMR), employ a greedy paradigm to iteratively select items that optimize both accuracy and diversity. However, prior methods typically exhibit quadratic complexity, limiting their applications to the re-ranking stage… ▽ More

    Submitted 14 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: accepted by KDD 2024 v2

  37. arXiv:2406.07843  [pdf, other

    cs.CV q-bio.NC

    Incremental Learning and Self-Attention Mechanisms Improve Neural System Identification

    Authors: Isaac Lin, Tianye Wang, Shang Gao, Shiming Tang, Tai Sing Lee

    Abstract: Convolutional neural networks (CNNs) have been shown to be the state-of-the-art approach for modeling the transfer functions of visual cortical neurons. Cortical neurons in the primary visual cortex are are sensitive to contextual information mediated by extensive horizontal and feedback connections. Standard CNNs can integrate global spatial image information to model such contextual modulation v… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Preprint NeurIPS 2024

  38. arXiv:2406.07119  [pdf, other

    cs.CV cs.AI

    T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text

    Authors: Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting Zhuang

    Abstract: In this work, we propose a two-stage sign language production (SLP) paradigm that first encodes sign language sequences into discrete codes and then autoregressively generates sign language from text based on the learned codebook. However, existing vector quantization (VQ) methods are fixed-length encodings, overlooking the uneven information density in sign language, which leads to under-encoding… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024

  39. arXiv:2406.05768  [pdf, other

    cs.CV cs.AI

    MLCM: Multistep Consistency Distillation of Latent Diffusion Model

    Authors: Qingsong Xie, Zhenyi Liao, Chen chen, Zhijie Deng, Shixiang Tang, Haonan Lu

    Abstract: Distilling large latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest. However, the majority of existing methods face a dilemma where they either (i) depend on multiple individual distilled models for different sampling budgets, or (ii) sacrifice generation quality with limited (e.g., 2-4) and/or moderate (e.g., 5-8) sampling steps. To addre… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  40. arXiv:2406.05513  [pdf, ps, other

    cs.CV

    A Two-Stage Adverse Weather Semantic Segmentation Method for WeatherProof Challenge CVPR 2024 Workshop UG2+

    Authors: Jianzhao Wang, Yanyan Wei, Dehua Hu, Yilin Zhang, Shengeng Tang, Kun Li, Zhao Zhang

    Abstract: This technical report presents our team's solution for the WeatherProof Dataset Challenge: Semantic Segmentation in Adverse Weather at CVPR'24 UG2+. We propose a two-stage deep learning framework for this task. In the first stage, we preprocess the provided dataset by concatenating images into video sequences. Subsequently, we leverage a low-rank video deraining method to generate high-fidelity ps… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  41. arXiv:2406.05232  [pdf, other

    cs.CL cs.LG

    Improving Logits-based Detector without Logits from Black-box LLMs

    Authors: Cong Zeng, Shengkun Tang, Xianjun Yang, Yuanzhou Chen, Yiyou Sun, zhiqiang xu, Yao Li, Haifeng Chen, Wei Cheng, Dongkuan Xu

    Abstract: The advent of Large Language Models (LLMs) has revolutionized text generation, producing outputs that closely mimic human writing. This blurring of lines between machine- and human-written text presents new challenges in distinguishing one from the other a task further complicated by the frequent updates and closed nature of leading proprietary LLMs. Traditional logits-based detection methods leve… ▽ More

    Submitted 18 August, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  42. arXiv:2406.03625  [pdf, other

    cs.CV cs.AI

    Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories

    Authors: Yan Zhang, Sergey Prokudin, Marko Mihajlovic, Qianli Ma, Siyu Tang

    Abstract: Understanding the dynamics of generic 3D scenes is fundamentally challenging in computer vision, essential in enhancing applications related to scene reconstruction, motion tracking, and avatar creation. In this work, we address the task as the problem of inferring dense, long-range motion of 3D points. By observing a set of point trajectories, we aim to learn an implicit motion field parameterize… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: cvpr24 post camera ready

  43. arXiv:2406.03474  [pdf, other

    cs.CV

    AD-H: Autonomous Driving with Hierarchical Agents

    Authors: Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu

    Abstract: Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  44. arXiv:2406.01658  [pdf, other

    cs.CV

    Proxy Denoising for Source-Free Domain Adaptation

    Authors: Song Tang, Wenxin Su, Mao Ye, Jianwei Zhang, Xiatian Zhu

    Abstract: Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain with no access to the source data. Inspired by the success of pre-trained large vision-language (ViL) models in many other applications, the latest SFDA methods have also validated the benefit of ViL models by leveraging their predictions as pseudo supervision. However, we observe that ViL's… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  45. arXiv:2405.20272  [pdf, other

    cs.LG cs.CR

    Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable

    Authors: Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, Zhiwei Steven Wu

    Abstract: Machine unlearning is motivated by desire for data autonomy: a person can request to have their data's influence removed from deployed models, and those models should be updated as if they were retrained without the person's data. We show that, counter-intuitively, these updates expose individuals to high-accuracy reconstruction attacks which allow the attacker to recover their data in its entiret… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  46. arXiv:2405.19789  [pdf, other

    cs.LG cs.DC

    Estimating before Debiasing: A Bayesian Approach to Detaching Prior Bias in Federated Semi-Supervised Learning

    Authors: Guogang Zhu, Xuefeng Liu, Xinghao Wu, Shaojie Tang, Chao Tang, Jianwei Niu, Hao Su

    Abstract: Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the mo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  47. arXiv:2405.18999  [pdf, other

    stat.AP cs.AI cs.RO

    Continuously Optimizing Radar Placement with Model Predictive Path Integrals

    Authors: Michael Potter, Shuo Tang, Paul Ghanem, Milica Stojanovic, Pau Closas, Murat Akcakaya, Ben Wright, Marius Necsoiu, Deniz Erdogmus, Michael Everett, Tales Imbiriba

    Abstract: Continuously optimizing sensor placement is essential for precise target localization in various military and civilian applications. While information theory has shown promise in optimizing sensor placement, many studies oversimplify sensor measurement models or neglect dynamic constraints of mobile sensors. To address these challenges, we employ a range measurement model that incorporates radar p… ▽ More

    Submitted 29 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Submitted to IEEE Aerospace and Electronic Systems

  48. arXiv:2405.17790  [pdf, other

    cs.CV

    Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

    Authors: Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

    Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520

  49. arXiv:2405.13002  [pdf, other

    cs.CL cs.AI

    DuetRAG: Collaborative Retrieval-Augmented Generation

    Authors: Dian Jiao, Li Cai, Jingsheng Huang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

    Abstract: Retrieval-Augmented Generation (RAG) methods augment the input of Large Language Models (LLMs) with relevant retrieved passages, reducing factual errors in knowledge-intensive tasks. However, contemporary RAG approaches suffer from irrelevant knowledge retrieval issues in complex domain questions (e.g., HotPot QA) due to the lack of corresponding domain knowledge, leading to low-quality generation… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 5 pages

  50. arXiv:2405.12796  [pdf, other

    cs.CV

    DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control

    Authors: Hong Chen, Xin Wang, Yipeng Zhang, Yuwei Zhou, Zeyang Zhang, Siao Tang, Wenwu Zhu

    Abstract: Generating customized content in videos has received increasing attention recently. However, existing works primarily focus on customized text-to-video generation for single subject, suffering from subject-missing and attribute-binding problems when the video is expected to contain multiple subjects. Furthermore, existing models struggle to assign the desired actions to the corresponding subjects… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.