Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 157 results for author: Song, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18085  [pdf, other

    cs.CL

    Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints

    Authors: Ran Song, Shizhu He, Shengxiang Gao, Li Cai, Kang Liu, Zhengtao Yu, Jun Zhao

    Abstract: Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 11 pages, ACL 2023

  2. arXiv:2406.01928  [pdf, other

    cs.RO

    History-Aware Planning for Risk-free Autonomous Navigation on Unknown Uneven Terrain

    Authors: Yinchuan Wang, Nianfei Du, Yongsen Qin, Xiang Zhang, Rui Song, Chaoqun Wang

    Abstract: It is challenging for the mobile robot to achieve autonomous and mapless navigation in the unknown environment with uneven terrain. In this study, we present a layered and systematic pipeline. At the local level, we maintain a tree structure that is dynamically extended with the navigation. This structure unifies the planning with the terrain identification. Besides, it contributes to explicitly i… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

  3. arXiv:2405.08301  [pdf, ps, other

    cs.IT

    Coded Downlink Massive Random Access and a Finite de Finetti Theorem

    Authors: Ryan Song, Kareem M. Attiah, Wei Yu

    Abstract: This paper considers a massive connectivity setting in which a base-station (BS) aims to communicate sources $(X_1,\cdots,X_k)$ to a randomly activated subset of $k$ users, among a large pool of $n$ users, via a common downlink message. Although the identities of the $k$ active users are assumed to be known at the BS, each active user only knows whether itself is active and does not know the ident… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 14 Pages, submitted to IEEE Transactions on Information Theory

  4. arXiv:2403.10828  [pdf, other

    cs.CR

    Data Availability and Decentralization: New Techniques for zk-Rollups in Layer 2 Blockchain Networks

    Authors: Chengpeng Huang, Rui Song, Shang Gao, Yu Guo, Bin Xiao

    Abstract: The scalability limitations of public blockchains have hindered their widespread adoption in real-world applications. While the Ethereum community is pushing forward in zk-rollup (zero-knowledge rollup) solutions, such as introducing the ``blob transaction'' in EIP-4844, Layer 2 networks encounter a data availability problem: storing transactions completely off-chain poses a risk of data loss, par… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  5. arXiv:2403.08251  [pdf, other

    cs.MA cs.AI cs.CY

    Emergence of Social Norms in Generative Agent Societies: Principles and Architecture

    Authors: Siyue Ren, Zhiyao Cui, Ruiqi Song, Zhen Wang, Shuyue Hu

    Abstract: Social norms play a crucial role in guiding agents towards understanding and adhering to standards of behavior, thus reducing social conflicts within multi-agent systems (MASs). However, current LLM-based (or generative) MASs lack the capability to be normative. In this paper, we propose a novel architecture, named CRSEC, to empower the emergence of social norms within generative MASs. Our archite… ▽ More

    Submitted 20 May, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at IJCAI 2024

  6. arXiv:2403.07312  [pdf, other

    cs.RO

    Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion

    Authors: Wenhui Tan, Bei Liu, Junbo Zhang, Ruihua Song, Jianlong Fu

    Abstract: Modeling a generalized visuomotor policy has been a longstanding challenge for both computer vision and robotics communities. Existing approaches often fail to efficiently leverage cross-dataset resources or rely on heavy Vision-Language models, which require substantial computational resources, thereby limiting their multi-task performance and application potential. In this paper, we introduce a… ▽ More

    Submitted 1 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  7. arXiv:2403.03452  [pdf, other

    cs.CV

    D4C Glove-train: Solving the RPM and Bongard-logo Problem by Circumscribing and Building Distribution for Concepts

    Authors: Ruizhuo Song, Beiming Yuan

    Abstract: This paper achieves noteworthy progress in the realm of abstract reasoning, particularly in addressing Raven's Progressive Matrices (RPM) and Bongard-Logo challenges. Initially, we introduce Lico-Net, a novel baseline model that resolves RPM problems with remarkable accuracy. Leveraging this foundation, we advance with the D3C approach, which advocates representing the underlying concepts in abstr… ▽ More

    Submitted 5 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 15 pages, 15 figures, 6 tables

  8. arXiv:2403.03190  [pdf, other

    cs.CV

    Triple-CFN: Restructuring Concept and Feature Spaces for Enhancing Abstract Reasoning Process

    Authors: Ruizhuo Song, Beiming Yuan

    Abstract: Visual abstract reasoning poses challenges to AI algorithms, requiring cognitive abilities beyond perception. For methodology, this study emphasizes the need to separately extract concepts and features from visual abstract reasoning problems, employing the responses of features to concepts as elements in the reasoning process. It also advocates for clear concept and feature spaces to tackle visual… ▽ More

    Submitted 21 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 13 pages, 16 figures, 7 tables

  9. arXiv:2403.03173  [pdf, other

    cs.CV

    Solving the Clustering Reasoning Problems by Modeling a Deep-Learning-Based Probabilistic Model

    Authors: Ruizhuo Song, Beiming Yuan

    Abstract: Visual abstract reasoning problems pose significant challenges to the perception and cognition abilities of artificial intelligence algorithms, demanding deeper pattern recognition and inductive reasoning beyond mere identification of explicit image features. Research advancements in this field often provide insights and technical support for other similar domains. In this study, we introduce PMoC… ▽ More

    Submitted 13 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 14 pages, 17 figures, 4 tables

  10. arXiv:2403.01316  [pdf, other

    cs.CV

    TUMTraf V2X Cooperative Perception Dataset

    Authors: Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan, Xingcheng Zhou, Rui Song, Alois C. Knoll

    Abstract: Cooperative perception offers several benefits for enhancing the capabilities of autonomous vehicles and improving road safety. Using roadside sensors in addition to onboard sensors increases reliability and extends the sensor range. External sensors offer higher situational awareness for automated vehicles and prevent occlusions. We propose CoopDet3D, a cooperative multi-modal fusion model, and T… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  11. arXiv:2402.11502  [pdf, other

    cs.CV

    GenAD: Generative End-to-End Autonomous Driving

    Authors: Wenzhao Zheng, Ruiqi Song, Xianda Guo, Chenming Zhang, Long Chen

    Abstract: Directly producing planning results from raw sensors has been a long-desired solution for autonomous driving and has attracted increasing attention recently. Most existing end-to-end autonomous driving methods factorize this problem into perception, motion prediction, and planning. However, we argue that the conventional progressive pipeline still cannot comprehensively model the entire traffic ev… ▽ More

    Submitted 6 April, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Code is available at: https://github.com/wzzheng/GenAD

  12. arXiv:2402.07635  [pdf, other

    cs.CV

    Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

    Authors: Rui Song, Chenwei Liang, Hu Cao, Zhiran Yan, Walter Zimmer, Markus Gross, Andreas Festag, Alois Knoll

    Abstract: Collaborative perception in automated vehicles leverages the exchange of information between agents, aiming to elevate perception results. Previous camera-based collaborative 3D perception methods typically employ 3D bounding boxes or bird's eye views as representations of the environment. However, these approaches fall short in offering a comprehensive 3D environmental prediction. To bridge this… ▽ More

    Submitted 25 April, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted by CVPR2024. Website link: https://rruisong.github.io/publications/CoHFF

  13. arXiv:2401.18045  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

    Authors: Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe, Ruihua Song

    Abstract: Recent advancements in language models have significantly enhanced performance in multiple speech-related tasks. Existing speech language models typically utilize task-dependent prompt tokens to unify various speech tasks in a single model. However, this design omits the intrinsic connections between different speech tasks, which can potentially boost the performance of each task. In this work, we… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 11 pages, 2 figures

  14. arXiv:2401.15656  [pdf, other

    cs.CL

    LLsM: Generative Linguistic Steganography with Large Language Model

    Authors: Yihao Wang, Ruiqi Song, Ru Zhang, Jianyi Liu, Lingxiao Li

    Abstract: Linguistic Steganography (LS) tasks aim to generate steganographic text (stego) based on secret information. Only authorized recipients can perceive the existence of the stegos and extract secrets, thereby preserving privacy. However, existing LS methods do not consider the controllable generation of stegos containing specific discourses such as style, genre, and theme. And they are difficult to s… ▽ More

    Submitted 7 April, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: 13 pages

  15. arXiv:2401.13098  [pdf, other

    cs.LG cs.AI cs.SI stat.AP

    Gravity-Informed Deep Learning Framework for Predicting Ship Traffic Flow and Invasion Risk of Non-Indigenous Species via Ballast Water Discharge

    Authors: Ruixin Song, Gabriel Spadon, Ronald Pelot, Stan Matwin, Amilcar Soares

    Abstract: Invasive species in water bodies pose a major threat to the environment and biodiversity globally. Due to increased transportation and trade, non-native species have been introduced to new environments, causing damage to ecosystems and leading to economic losses in agriculture, forestry, and fisheries. Therefore, there is a pressing need for risk assessment and management techniques to mitigate th… ▽ More

    Submitted 29 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: 26 pages, 7 figures, under review

  16. arXiv:2401.00139  [pdf, other

    cs.AI cs.CL cs.LG stat.ME

    Is Knowledge All Large Language Models Needed for Causal Reasoning?

    Authors: Hengrui Cai, Shengjie Liu, Rui Song

    Abstract: This paper explores the causal reasoning of large language models (LLMs) to enhance their interpretability and reliability in advancing artificial intelligence. Despite the proficiency of LLMs in a range of tasks, their potential for understanding causality requires further exploration. We propose a novel causal attribution model that utilizes ``do-operators" for constructing counterfactual scenar… ▽ More

    Submitted 5 June, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

    Comments: A Python implementation of our proposed method is available at https://github.com/ncsulsj/Causal_LLM

  17. arXiv:2312.17263  [pdf, other

    cs.CL

    TACIT: A Target-Agnostic Feature Disentanglement Framework for Cross-Domain Text Classification

    Authors: Rui Song, Fausto Giunchiglia, Yingji Li, Mingjie Tian, Hao Xu

    Abstract: Cross-domain text classification aims to transfer models from label-rich source domains to label-poor target domains, giving it a wide range of practical applications. Many approaches promote cross-domain generalization by capturing domain-invariant features. However, these methods rely on unlabeled samples provided by the target domains, which renders the model ineffective when the target domain… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI-2024

  18. arXiv:2312.17122  [pdf, other

    cs.CL cs.AI stat.ML

    Large Language Model for Causal Decision Making

    Authors: Haitao Jiang, Lin Ge, Yuhe Gao, Jianian Wang, Rui Song

    Abstract: Large Language Models (LLMs) have shown their success in language understanding and reasoning on general topics. However, their capability to perform inference based on user-specified structured data and knowledge in corpus-rare concepts, such as causal decision-making is still limited. In this work, we explore the possibility of fine-tuning an open-sourced LLM into LLM4Causal, which can identify… ▽ More

    Submitted 11 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  19. arXiv:2312.15595  [pdf, other

    stat.ML cs.LG econ.EM

    Zero-Inflated Bandits

    Authors: Haoyu Wei, Runzhe Wan, Lei Shi, Rui Song

    Abstract: Many real applications of bandits have sparse non-zero rewards, leading to slow learning rates. A careful distribution modeling that utilizes problem-specific structures is known as critical to estimation efficiency in the statistics literature, yet is under-explored in bandits. To fill the gap, we initiate the study of zero-inflated bandits, where the reward is modeled as a classic semi-parametri… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

  20. arXiv:2312.12871  [pdf, other

    cs.LG stat.ML

    Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches

    Authors: Yu Liu, Runzhe Wan, James McQueen, Doug Hains, Jinxiang Gu, Rui Song

    Abstract: The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency. Traditionally, experimenters determine AES based on domain knowledge. However, this method becomes impractical for online experimentation services managing numerous experiments, and a more automated approach is hence of great demand. We initiate the study of da… ▽ More

    Submitted 17 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  21. arXiv:2312.10099  [pdf, other

    cs.CV

    ADA-YOLO: Dynamic Fusion of YOLOv8 and Adaptive Heads for Precise Image Detection and Diagnosis

    Authors: Shun Liu, Jianan Zhang, Ruocheng Song, Teik Toe Teoh

    Abstract: Object detection and localization are crucial tasks for biomedical image analysis, particularly in the field of hematology where the detection and recognition of blood cells are essential for diagnosis and treatment decisions. While attention-based methods have shown significant progress in object detection in various domains, their application in medical object detection has been limited due to t… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  22. arXiv:2312.06677  [pdf, other

    cs.LG cs.AI cs.CL

    Intelligent Virtual Assistants with LLM-based Process Automation

    Authors: Yanchu Guan, Dong Wang, Zhixuan Chu, Shiyu Wang, Feiyue Ni, Ruihua Song, Longfei Li, Jinjie Gu, Chenyi Zhuang

    Abstract: While intelligent virtual assistants like Siri, Alexa, and Google Assistant have become ubiquitous in modern life, they still face limitations in their ability to follow multi-step instructions and accomplish complex goals articulated in natural language. However, recent breakthroughs in large language models (LLMs) show promise for overcoming existing barriers by enhancing natural language proces… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  23. arXiv:2311.09053  [pdf, other

    cs.CL cs.AI

    Assessing Knowledge Editing in Language Models via Relation Perspective

    Authors: Yifan Wei, Xiaoyan Yu, Huanhuan Ma, Fangyu Lei, Yixuan Weng, Ran Song, Kang Liu

    Abstract: Knowledge Editing (KE) for modifying factual knowledge in Large Language Models (LLMs) has been receiving increasing attention. However, existing knowledge editing methods are entity-centric, and it is unclear whether this approach is suitable for a relation-centric perspective. To address this gap, this paper constructs a new benchmark named RaKE, which focuses on Relation based Knowledge Editing… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Work in progress

  24. arXiv:2311.01775  [pdf, other

    cs.CL

    UP4LS: User Profile Constructed by Multiple Attributes for Enhancing Linguistic Steganalysis

    Authors: Yihao Wang, Ruiqi Song, Lingxiao Li, Yifan Tang, Ru Zhang, Jianyi Liu

    Abstract: Linguistic steganalysis (LS) tasks aim to detect whether a text contains secret information. Existing LS methods focus on the deep-learning model design and they achieve excellent results in ideal data. However, they overlook the unique user characteristics, leading to weak performance in social networks. And a few stegos here that further complicate detection. We propose the UP4LS, a framework wi… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: 15 pages, 7 figures, 14 tables

  25. arXiv:2311.01487  [pdf, other

    cs.CV cs.CL

    What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning

    Authors: Yifan Du, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Jinpeng Wang, Chuyuan Wang, Mingchen Cai, Ruihua Song, Ji-Rong Wen

    Abstract: Visual instruction tuning is an essential approach to improving the zero-shot generalization capability of Multi-modal Large Language Models (MLLMs). A surge of visual instruction datasets with various focuses and characteristics have been proposed recently, enabling MLLMs to achieve surprising results on evaluation benchmarks. To develop more capable MLLMs, in this paper, we aim to investigate a… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Work in progress

  26. arXiv:2310.07301  [pdf, other

    cs.CL

    Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models

    Authors: Yuchong Sun, Che Liu, Kun Zhou, Jinwen Huang, Ruihua Song, Wayne Xin Zhao, Fuzheng Zhang, Di Zhang, Kun Gai

    Abstract: Humans often interact with large language models (LLMs) in multi-turn interaction to obtain desired answers or more information. However, most existing studies overlook the multi-turn instruction following ability of LLMs, in terms of training dataset, training method, and evaluation benchmark. In this paper, we introduce Parrot, a solution aiming to enhance multi-turn instruction following for LL… ▽ More

    Submitted 23 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  27. arXiv:2309.06006  [pdf, ps, other

    cs.CV cs.AI

    SoccerNet 2023 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim , et al. (77 additional authors not shown)

    Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, fo… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  28. arXiv:2308.11171  [pdf, other

    cs.CV cs.CL

    ViCo: Engaging Video Comment Generation with Human Preference Rewards

    Authors: Yuchong Sun, Bei Liu, Xu Chen, Ruihua Song, Jianlong Fu

    Abstract: Engaging video comments play an important role in video social media, as they are the carrier of feelings, thoughts, or humor of the audience. Preliminary works have made initial exploration for video comment generation by adopting caption-style encoder-decoder models. However, comment generation presents some unique challenges distinct from caption generation, which makes these methods somewhat l… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  29. arXiv:2308.10149  [pdf, other

    cs.CL cs.AI

    A Survey on Fairness in Large Language Models

    Authors: Yingji Li, Mengnan Du, Rui Song, Xin Wang, Ying Wang

    Abstract: Large Language Models (LLMs) have shown powerful performance and development prospects and are widely deployed in the real world. However, LLMs can capture social biases from unprocessed training data and propagate the biases to downstream tasks. Unfair LLM systems have undesirable social impacts and potential harms. In this paper, we provide a comprehensive review of related research on fairness… ▽ More

    Submitted 21 February, 2024; v1 submitted 19 August, 2023; originally announced August 2023.

    Comments: 28 pages, 5 figures, 2 tables, 175 references

  30. arXiv:2308.10016  [pdf, other

    cs.CV

    Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation

    Authors: Yang Hai, Rui Song, Jiaojiao Li, David Ferstl, Yinlin Hu

    Abstract: Most self-supervised 6D object pose estimation methods can only work with additional depth information or rely on the accurate annotation of 2D segmentation masks, limiting their application range. In this paper, we propose a 6D object pose estimation method that can be trained with pure RGB images without any auxiliary information. We first obtain a rough pose initialization from networks trained… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023

  31. arXiv:2308.08930  [pdf, other

    cs.CV

    Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection

    Authors: Runmin Cong, Hongyu Liu, Chen Zhang, Wei Zhang, Feng Zheng, Ran Song, Sam Kwong

    Abstract: By integrating complementary information from RGB image and depth map, the ability of salient object detection (SOD) for complex and challenging scenes can be improved. In recent years, the important role of Convolutional Neural Networks (CNNs) in feature extraction and cross-modality interaction has been fully explored, but it is still insufficient in modeling global long-range dependencies of se… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  32. arXiv:2307.14679  [pdf, ps, other

    cs.CR

    LinkDID: A Privacy-Preserving, Sybil-Resistant and Key-Recoverable Decentralized Identity Scheme

    Authors: Rui Song

    Abstract: Decentralized identity mechanisms endeavor to endow users with complete sovereignty over their digital assets within the Web3 ecosystem. Unfortunately, this benefit frequently comes at the expense of users' credential and identity privacy. Additionally, existing schemes fail to resist Sybil attacks that have long plagued Web3, and lack reasonable key recovery mechanisms to regain control of digita… ▽ More

    Submitted 2 January, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: 20 pages

  33. arXiv:2307.08500  [pdf, other

    cs.CV

    Cumulative Spatial Knowledge Distillation for Vision Transformers

    Authors: Borui Zhao, Renjie Song, Jiajun Liang

    Abstract: Distilling knowledge from convolutional neural networks (CNNs) is a double-edged sword for vision transformers (ViTs). It boosts the performance since the image-friendly local-inductive bias of CNN helps ViT learn faster and better, but leading to two problems: (1) Network designs of CNN and ViT are completely different, which leads to different semantic levels of intermediate features, making spa… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV 2023

  34. arXiv:2307.08436  [pdf, other

    cs.CV

    DOT: A Distillation-Oriented Trainer

    Authors: Borui Zhao, Quan Cui, Renjie Song, Jiajun Liang

    Abstract: Knowledge distillation transfers knowledge from a large model to a small one via task and distillation losses. In this paper, we observe a trade-off between task and distillation losses, i.e., introducing distillation loss limits the convergence of task loss. We believe that the trade-off results from the insufficient optimization of distillation loss. The reason is: The teacher has a lower task l… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV 2023

  35. arXiv:2307.01214  [pdf, other

    cs.CL cs.AI

    Automatic Counterfactual Augmentation for Robust Text Classification Based on Word-Group Search

    Authors: Rui Song, Fausto Giunchiglia, Yingji Li, Hao Xu

    Abstract: Despite large-scale pre-trained language models have achieved striking results for text classificaion, recent work has raised concerns about the challenge of shortcut learning. In general, a keyword is regarded as a shortcut if it creates a superficial association with the label, resulting in a false prediction. Conversely, shortcut learning can be mitigated if the model relies on robust causal fe… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

    Comments: 13 pages, 7 figures

  36. arXiv:2306.13266  [pdf, other

    cs.CV

    Shape-Constraint Recurrent Flow for 6D Object Pose Estimation

    Authors: Yang Hai, Rui Song, Jiaojiao Li, Yinlin Hu

    Abstract: Most recent 6D object pose methods use 2D optical flow to refine their results. However, the general optical flow methods typically do not consider the target's 3D shape information during matching, making them less effective in 6D object pose estimation. In this work, we propose a shape-constraint recurrent matching framework for 6D object pose estimation. We first compute a pose-induced flow bas… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: CVPR 2023

  37. arXiv:2306.05716  [pdf, other

    cs.RO cs.AI

    Transferring Foundation Models for Generalizable Robotic Manipulation

    Authors: Jiange Yang, Wenhui Tan, Chuhao Jin, Keling Yao, Bei Liu, Jianlong Fu, Ruihua Song, Gangshan Wu, Limin Wang

    Abstract: Improving the generalization capabilities of general-purpose robotic manipulation agents in the real world has long been a significant challenge. Existing approaches often rely on collecting large-scale robotic data which is costly and time-consuming, such as the RT-1 dataset. However, due to insufficient diversity of data, these approaches typically suffer from limiting their capability in open-d… ▽ More

    Submitted 18 March, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 9 pages, 5 figures

  38. arXiv:2306.02552  [pdf, other

    cs.IR cs.AI

    User Behavior Simulation with Large Language Model based Agents

    Authors: Lei Wang, Jingsen Zhang, Hao Yang, Zhiyuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen

    Abstract: Simulating high quality user behavior data has always been a fundamental problem in human-centered applications, where the major difficulty originates from the intricate mechanism of human decision process. Recently, substantial evidences have suggested that by learning huge amounts of web knowledge, large language models (LLMs) can achieve human-like intelligence. We believe these models can prov… ▽ More

    Submitted 15 February, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

    Comments: 28 pages, 9 figures

  39. arXiv:2305.18898  [pdf, other

    cs.RO cs.AI

    AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

    Authors: Chuhao Jin, Wenhui Tan, Jiange Yang, Bei Liu, Ruihua Song, Limin Wang, Jianlong Fu

    Abstract: We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks, such as making a smiley face using building blocks. These tasks often involve complex multi-step reasoning, presenting significant challenges due to the limited paired data connecting human instructions (e.g., making a smiley face) and robot actions (e.g., end-effector movement). Existing appro… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  40. arXiv:2305.12954  [pdf, other

    cs.CV

    Is Synthetic Data From Diffusion Models Ready for Knowledge Distillation?

    Authors: Zheng Li, Yuxuan Li, Penghai Zhao, Renjie Song, Xiang Li, Jian Yang

    Abstract: Diffusion models have recently achieved astonishing performance in generating high-fidelity photo-realistic images. Given their huge success, it is still unclear whether synthetic images are applicable for knowledge distillation when real images are unavailable. In this paper, we extensively study whether and how synthetic images produced from state-of-the-art diffusion models can be used for know… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  41. arXiv:2305.12200  [pdf, other

    cs.SD cs.AI eess.AS

    ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios

    Authors: Yuyue Wang, Huan Xiao, Yihan Wu, Ruihua Song

    Abstract: Text to Speech (TTS) models can generate natural and high-quality speech, but it is not expressive enough when synthesizing speech with dramatic expressiveness, such as stand-up comedies. Considering comedians have diverse personal speech styles, including personal prosody, rhythm, and fillers, it requires real-world datasets and strong speech style modeling capabilities, which brings challenges.… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

    Comments: 5 pages, 4 tables, 2 figure

  42. arXiv:2305.11654  [pdf, other

    cs.LG

    V2X-Boosted Federated Learning for Cooperative Intelligent Transportation Systems with Contextual Client Selection

    Authors: Rui Song, Lingjuan Lyu, Wei Jiang, Andreas Festag, Alois Knoll

    Abstract: Machine learning (ML) has revolutionized transportation systems, enabling autonomous driving and smart traffic services. Federated learning (FL) overcomes privacy constraints by training ML models in distributed systems, exchanging model parameters instead of raw data. However, the dynamic states of connected vehicles affect the network connection quality and influence the FL performance. To tackl… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted at ICRA 2023 Workshop on Collaborative Perception and Learning

  43. arXiv:2304.09453  [pdf, other

    cs.CV

    Network Pruning Spaces

    Authors: Xuanyu He, Yu-I Yang, Ran Song, Jiachen Pu, Conggang Hu, Feijun Jiang, Wei Zhang, Huanghao Ding

    Abstract: Network pruning techniques, including weight pruning and filter pruning, reveal that most state-of-the-art neural networks can be accelerated without a significant performance drop. This work focuses on filter pruning which enables accelerated inference with any off-the-shelf deep learning library and hardware. We propose the concept of \emph{network pruning spaces} that parametrize populations of… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

  44. FedBEVT: Federated Learning Bird's Eye View Perception Transformer in Road Traffic Systems

    Authors: Rui Song, Runsheng Xu, Andreas Festag, Jiaqi Ma, Alois Knoll

    Abstract: Bird's eye view (BEV) perception is becoming increasingly important in the field of autonomous driving. It uses multi-view camera data to learn a transformer model that directly projects the perception of the road environment onto the BEV perspective. However, training a transformer model often requires a large amount of data, and as camera data for road traffic are often private, they are typical… ▽ More

    Submitted 8 September, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Accepted by IEEE T-IV. Code: https://github.com/rruisong/FedBEVT

  45. arXiv:2304.00420  [pdf, other

    cs.LG

    Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring

    Authors: Runzhe Wan, Yu Liu, James McQueen, Doug Hains, Rui Song

    Abstract: With the growing needs of online A/B testing to support the innovation in industry, the opportunity cost of running an experiment becomes non-negligible. Therefore, there is an increasing demand for an efficient continuous monitoring service that allows early stopping when appropriate. Classic statistical methods focus on hypothesis testing and are mostly developed for traditional high-stake probl… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

  46. arXiv:2303.12992  [pdf, other

    cs.LG cs.AI cs.CV

    A Survey of Historical Learning: Learning Models with Learning History

    Authors: Xiang Li, Ge Wu, Lingfeng Yang, Wenhai Wang, Renjie Song, Jian Yang

    Abstract: New knowledge originates from the old. The various types of elements, deposited in the training history, are a large amount of wealth for improving learning deep models. In this survey, we comprehensively review and summarize the topic--``Historical Learning: Learning Models with Learning History'', which learns better neural models with the help of their learning history during its optimization,… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: Xiang Li and Ge Wu have equal contributions

  47. arXiv:2303.12396  [pdf, other

    cs.CV

    Rigidity-Aware Detection for 6D Object Pose Estimation

    Authors: Yang Hai, Rui Song, Jiaojiao Li, Mathieu Salzmann, Yinlin Hu

    Abstract: Most recent 6D object pose estimation methods first use object detection to obtain 2D bounding boxes before actually regressing the pose. However, the general object detection methods they use are ill-suited to handle cluttered scenes, thus producing poor initialization to the subsequent pose network. To address this, we propose a rigidity-aware detection method exploiting the fact that, in 6D pos… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  48. arXiv:2303.11066  [pdf, other

    cs.CV

    Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data

    Authors: Yuhao Chen, Xin Tan, Borui Zhao, Zhaowei Chen, Renjie Song, Jiajun Liang, Xuequan Lu

    Abstract: Semi-supervised learning (SSL) has attracted enormous attention due to its vast potential of mitigating the dependence on large labeled datasets. The latest methods (e.g., FixMatch) use a combination of consistency regularization and pseudo-labeling to achieve remarkable successes. However, these methods all suffer from the waste of complicated examples since all pseudo-labels have to be selected… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  49. arXiv:2303.07601  [pdf, other

    cs.CV

    V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception

    Authors: Runsheng Xu, Xin Xia, Jinlong Li, Hanzhao Li, Shuo Zhang, Zhengzhong Tu, Zonglin Meng, Hao Xiang, Xiaoyu Dong, Rui Song, Hongkai Yu, Bolei Zhou, Jiaqi Ma

    Abstract: Modern perception systems of autonomous vehicles are known to be sensitive to occlusions and lack the capability of long perceiving range. It has been one of the key bottlenecks that prevents Level 5 autonomy. Recent research has demonstrated that the Vehicle-to-Vehicle (V2V) cooperative perception system has great potential to revolutionize the autonomous driving industry. However, the lack of a… ▽ More

    Submitted 19 March, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023. Website link: https://research.seas.ucla.edu/mobility-lab/v2v4real

  50. arXiv:2302.01543  [pdf, other

    cs.LG

    Multiplier Bootstrap-based Exploration

    Authors: Runzhe Wan, Haoyu Wei, Branislav Kveton, Rui Song

    Abstract: Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty. In this paper, we propose Multiplier Bootstrap-based Exploration (MBE), a novel exploration strategy that is applicable to any reward model amenable to weighted loss minimization. We prove both instance-dependent a… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.