Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 412 results for author: Zhao, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.06078  [pdf, other

    cs.RO eess.SY

    PEERNet: An End-to-End Profiling Tool for Real-Time Networked Robotic Systems

    Authors: Aditya Narayanan, Pranav Kasibhatla, Minkyu Choi, Po-han Li, Ruihan Zhao, Sandeep Chinchali

    Abstract: Networked robotic systems balance compute, power, and latency constraints in applications such as self-driving vehicles, drone swarms, and teleoperated surgery. A core problem in this domain is deciding when to offload a computationally expensive task to the cloud, a remote server, at the cost of communication latency. Task offloading algorithms often rely on precise knowledge of system-specific p… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: Accepted at IROS 2024

  2. arXiv:2409.04390  [pdf, other

    cs.CV

    Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences

    Authors: Rui Yu, Runkai Zhao, Cong Nie, Heng Wang, HuaiCheng Yan, Meng Wang

    Abstract: Accurate and robust LiDAR 3D object detection is essential for comprehensive scene understanding in autonomous driving. Despite its importance, LiDAR detection performance is limited by inherent constraints of point cloud data, particularly under conditions of extended distances and occlusions. Recently, temporal aggregation has been proven to significantly enhance detection accuracy by fusing mul… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  3. arXiv:2409.03553  [pdf

    cs.CV

    Organized Grouped Discrete Representation for Object-Centric Learning

    Authors: Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen

    Abstract: Object-Centric Learning (OCL) represents dense image or video pixels as sparse object features. Representative methods utilize discrete representation composed of Variational Autoencoder (VAE) template features to suppress pixel-level information redundancy and guide object-level feature aggregation. The most recent advancement, Grouped Discrete Representation (GDR), further decomposes these templ… ▽ More

    Submitted 10 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

  4. arXiv:2409.02418  [pdf, other

    cs.CV

    MOSMOS: Multi-organ segmentation facilitated by medical report supervision

    Authors: Weiwei Tian, Xinyu Huang, Junlin Hou, Caiyue Ren, Longquan Jiang, Rui-Wei Zhao, Gang Jin, Yuejie Zhang, Daoying Geng

    Abstract: Owing to a large amount of multi-modal data in modern medical systems, such as medical images and reports, Medical Vision-Language Pre-training (Med-VLP) has demonstrated incredible achievements in coarse-grained downstream tasks (i.e., medical classification, retrieval, and visual question answering). However, the problem of transferring knowledge learned from Med-VLP to fine-grained multi-organ… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 14 pages, 7 figures

  5. arXiv:2408.17437  [pdf, other

    cs.CL

    SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists

    Authors: Raoyuan Zhao, Abdullatif Köksal, Yihong Liu, Leonie Weissweiler, Anna Korhonen, Hinrich Schütze

    Abstract: Traditional benchmarking in NLP typically involves using static held-out test sets. However, this approach often results in an overestimation of performance and lacks the ability to offer comprehensive, interpretable, and dynamic assessments of NLP models. Recently, works like DynaBench (Kiela et al., 2021) and CheckList (Ribeiro et al., 2020) have addressed these limitations through behavioral te… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  6. arXiv:2408.14764  [pdf, other

    cs.CV cs.MM

    SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding

    Authors: Chuanghao Ding, Xuejing Liu, Wei Tang, Juan Li, Xiaoliang Wang, Rui Zhao, Cam-Tu Nguyen, Fei Tan

    Abstract: This paper introduces SynthDoc, a novel synthetic document generation pipeline designed to enhance Visual Document Understanding (VDU) by generating high-quality, diverse datasets that include text, images, tables, and charts. Addressing the challenges of data acquisition and the limitations of existing datasets, SynthDoc leverages publicly available corpora and advanced rendering tools to create… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  7. arXiv:2408.13376  [pdf, other

    cs.AI cs.LG eess.SY math.CT

    Reduce, Reuse, Recycle: Categories for Compositional Reinforcement Learning

    Authors: Georgios Bakirtzis, Michail Savvas, Ruihan Zhao, Sandeep Chinchali, Ufuk Topcu

    Abstract: In reinforcement learning, conducting task composition by forming cohesive, executable sequences from multiple tasks remains challenging. However, the ability to (de)compose tasks is a linchpin in developing robotic systems capable of learning complex behaviors. Yet, compositional reinforcement learning is beset with difficulties, including the high dimensionality of the problem space, scarcity of… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: ECAI 2024

  8. arXiv:2408.10504  [pdf, other

    cs.AI

    QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

    Authors: Yilun Kong, Hangyu Mao, Qi Zhao, Bin Zhang, Jingqing Ruan, Li Shen, Yongzhe Chang, Xueqian Wang, Rui Zhao, Dacheng Tao

    Abstract: Prompt engineering has demonstrated remarkable success in enhancing the performance of large language models (LLMs) across diverse tasks. However, most existing prompt optimization methods only focus on the task-level performance, overlooking the importance of query-preferred prompts, which leads to suboptimal performances. Additionally, these methods rely heavily on frequent interactions with LLM… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  9. arXiv:2408.09385  [pdf, other

    cs.CL cs.AI

    Offline RLHF Methods Need More Accurate Supervision Signals

    Authors: Shiqi Wang, Zhengze Zhang, Rui Zhao, Fei Tan, Cam Tu Nguyen

    Abstract: With the rapid advances in Large Language Models (LLMs), aligning LLMs with human preferences become increasingly important. Although Reinforcement Learning with Human Feedback (RLHF) proves effective, it is complicated and highly resource-intensive. As such, offline RLHF has been introduced as an alternative solution, which directly optimizes LLMs with ranking losses on a fixed preference dataset… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: under review

  10. arXiv:2408.09071  [pdf, other

    cs.HC cs.CY

    Me want cookie! Towards automated and transparent data governance on the Web

    Authors: Jesse Wright, Beatriz Esteves, Rui Zhao

    Abstract: This paper presents a sociotechnical vision for managing personal data, including cookies, within Web browsers. We first present our vision for a future of semi-automated data governance on the Web, using policy languages to describe data terms of use, and having browsers act on behalf of users to enact policy-based controls. Then, we present an overview of the technical research required to {prov… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Submitted to "NeXt-generation Data Governance workshop 2024"; available on OpenReview at https://openreview.net/forum?id=Bhia6mPaCF

  11. arXiv:2408.05029  [pdf, other

    cs.CV

    Collaborative Static-Dynamic Teaching: A Semi-Supervised Framework for Stripe-Like Space Target Detection

    Authors: Zijian Zhu, Ali Zia, Xuesong Li, Bingbing Dan, Yuebo Ma, Hongfeng Long, Kaili Lu, Enhai Liu, Rujin Zhao

    Abstract: Stripe-like space target detection (SSTD) is crucial for space situational awareness. Traditional unsupervised methods often fail in low signal-to-noise ratio and variable stripe-like space targets scenarios, leading to weak generalization. Although fully supervised learning methods improve model generalization, they require extensive pixel-level labels for training. In the SSTD task, manually cre… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  12. arXiv:2408.02978  [pdf, other

    cs.MM cs.AI cs.CV

    ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval

    Authors: Ruixiang Zhao, Jian Jia, Yan Li, Xuehan Bai, Quan Chen, Han Li, Peng Jiang, Xirong Li

    Abstract: E-commerce is increasingly multimedia-enriched, with products exhibited in a broad-domain manner as images, short videos, or live stream promotions. A unified and vectorized cross-domain production representation is essential. Due to large intra-product variance and high inter-product similarity in the broad-domain scenario, a visual-only representation is inadequate. While Automatic Speech Recogn… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  13. arXiv:2408.01090  [pdf, other

    cs.CL cs.AR cs.NE

    General-purpose Dataflow Model with Neuromorphic Primitives

    Authors: Weihao Zhang, Yu Du, Hongyi Li, Songchen Ma, Rong Zhao

    Abstract: Neuromorphic computing exhibits great potential to provide high-performance benefits in various applications beyond neural networks. However, a general-purpose program execution model that aligns with the features of neuromorphic computing is required to bridge the gap between program versatility and neuromorphic hardware efficiency. The dataflow model offers a potential solution, but it faces hig… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  14. arXiv:2407.19248  [pdf

    cs.AI

    Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint

    Authors: Song Zhang, Yuqing Duan, Daoliang Li, Ran Zhao

    Abstract: In underwater image enhancement (UIE), convolutional neural networks (CNN) have inherent limitations in modeling long-range dependencies and are less effective in recovering global features. While Transformers excel at modeling long-range dependencies, their quadratic computational complexity with increasing image resolution presents significant efficiency challenges. Additionally, most supervised… ▽ More

    Submitted 31 July, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

  15. arXiv:2407.18097  [pdf, other

    cs.CV

    SSTD: Stripe-Like Space Target Detection Using Single-Point Weak Supervision

    Authors: Zijian Zhu, Ali Zia, Xuesong Li, Bingbing Dan, Yuebo Ma, Enhai Liu, Rujin Zhao

    Abstract: Stripe-like space target detection (SSTD) plays a key role in enhancing space situational awareness and assessing spacecraft behaviour. This domain faces three challenges: the lack of publicly available datasets, interference from stray light and stars, and the variability of stripe-like targets, which makes manual labeling both inaccurate and labor-intensive. In response, we introduces `AstroStri… ▽ More

    Submitted 16 September, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  16. arXiv:2407.17467  [pdf, other

    cs.CL cs.LG

    CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models

    Authors: Jiawei Gu, Zacc Yang, Chuanghao Ding, Rui Zhao, Fei Tan

    Abstract: Large Language Models (LLMs) excel in diverse tasks but often underperform in specialized fields due to limited domain-specific or proprietary corpus. Continual pre-training (CPT) enhances LLM capabilities by imbuing new domain-specific or proprietary knowledge while replaying general corpus to prevent catastrophic forgetting. The data mixture ratio of general corpus and domain-specific corpus, ho… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  17. arXiv:2407.17150  [pdf, other

    cs.CL cs.SE

    SimCT: A Simple Consistency Test Protocol in LLMs Development Lifecycle

    Authors: Fufangchen Zhao, Guoqiang Jin, Rui Zhao, Jiangheng Huang, Fei Tan

    Abstract: In this work, we report our efforts to advance the standard operation procedure of developing Large Language Models (LLMs) or LLMs-based systems or services in industry. We introduce the concept of Large Language Model Development Lifecycle (LDLC) and then highlight the importance of consistency test in ensuring the delivery quality. The principled solution of consistency test, however, is usually… ▽ More

    Submitted 8 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  18. arXiv:2407.14541  [pdf

    physics.soc-ph cs.CY cs.LG

    Mitigating biases in big mobility data: a case study of monitoring large-scale transit systems

    Authors: Feilong Wang, Xuegang Ban, Peng Chen, Chenxi Liu, Rong Zhao

    Abstract: Big mobility datasets (BMD) have shown many advantages in studying human mobility and evaluating the performance of transportation systems. However, the quality of BMD remains poorly understood. This study evaluates biases in BMD and develops mitigation methods. Using Google and Apple mobility data as examples, this study compares them with benchmark data from governmental agencies. Spatio-tempora… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 10 figures. Transportation Letters. August 2024

  19. arXiv:2407.10811  [pdf, other

    cs.MA cs.AI cs.LG

    GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents

    Authors: Haoyuan Jiang, Xuantang Xiong, Ziyue Li, Hangyu Mao, Guanghu Sui, Jingqing Ruan, Yuheng Cheng, Hua Wei, Wolfgang Ketter, Rui Zhao

    Abstract: Currently, traffic signal control (TSC) methods based on reinforcement learning (RL) have proven superior to traditional methods. However, most RL methods face difficulties when applied in the real world due to three factors: input, output, and the cycle-flow relation. The industry's observable input is much more limited than simulation-based RL methods. For real-world solutions, only flow can be… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under Review of IEEE Transactions on Intelligent Transportation Systems

  20. arXiv:2407.09899  [pdf, other

    cs.RO

    DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Pipeline for Multi-Dexterous Robotic Hands

    Authors: Zhengshen Zhang, Lei Zhou, Chenchen Liu, Zhiyang Liu, Chengran Yuan, Sheng Guo, Ruiteng Zhao, Marcelo H. Ang Jr., Francis EH Tay

    Abstract: The versatility and adaptability of human grasping catalyze advancing dexterous robotic manipulation. While significant strides have been made in dexterous grasp generation, current research endeavors pivot towards optimizing object manipulation while ensuring functional integrity, emphasizing the synthesis of functional grasps following desired affordance instructions. This paper addresses the ch… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  21. arXiv:2407.09842  [pdf, other

    cs.CV

    Eliminating Feature Ambiguity for Few-Shot Segmentation

    Authors: Qianxiong Xu, Guosheng Lin, Chen Change Loy, Cheng Long, Ziyue Li, Rui Zhao

    Abstract: Recent advancements in few-shot segmentation (FSS) have exploited pixel-by-pixel matching between query and support features, typically based on cross attention, which selectively activate query foreground (FG) features that correspond to the same-class support FG features. However, due to the large receptive fields in deep layers of the backbone, the extracted query and support FG features are in… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: This paper is accepted by ECCV'24

  22. arXiv:2407.07972  [pdf, other

    cs.LG cs.AI

    Deconstructing What Makes a Good Optimizer for Language Models

    Authors: Rosie Zhao, Depen Morwani, David Brandfonbrener, Nikhil Vyas, Sham Kakade

    Abstract: Training language models becomes increasingly expensive with scale, prompting numerous attempts to improve optimization efficiency. Despite these efforts, the Adam optimizer remains the most widely used, due to a prevailing view that it is the most effective approach. We aim to compare several optimization algorithms, including SGD, Adafactor, Adam, and Lion, in the context of autoregressive langu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  23. arXiv:2407.05415  [pdf, other

    cs.CV

    DIVESPOT: Depth Integrated Volume Estimation of Pile of Things Based on Point Cloud

    Authors: Yiran Ling, Rongqiang Zhao, Yixuan Shen, Dongbo Li, Jing Jin, Jie Liu

    Abstract: Non-contact volume estimation of pile-type objects has considerable potential in industrial scenarios, including grain, coal, mining, and stone materials. However, using existing method for these scenarios is challenged by unstable measurement poses, significant light interference, the difficulty of training data collection, and the computational burden brought by large piles. To address the above… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  24. arXiv:2407.01726  [pdf

    cs.CV

    Grouped Discrete Representation Guides Object-Centric Learning

    Authors: Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen

    Abstract: Similar to humans perceiving visual scenes as objects, Object-Centric Learning (OCL) can abstract dense images or videos into sparse object-level features. Transformer-based OCL handles complex textures well due to the decoding guidance of discrete representation, obtained by discretizing noisy features in image or video feature maps using template features from a codebook. However, treating featu… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    ACM Class: I.4.6

  25. arXiv:2406.19065  [pdf, other

    cs.CL

    STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis

    Authors: Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang Jing, Haining Tan, Jingping Bi

    Abstract: The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address thi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  26. arXiv:2406.17580  [pdf, other

    cs.DC

    Experimental Evaluation of Distributed k-Core Decomposition

    Authors: Bin Guo, Runze Zhao

    Abstract: Given an undirected graph, the $k$-core is a subgraph in which each node has at least $k$ connections, which is widely used in graph analytics to identify core subgraphs within a larger graph. The sequential $k$-core decomposition algorithm faces limitations due to memory constraints and data graphs can be inherently distributed. A distributed approach is proposed to overcome limitations by allowi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  27. arXiv:2406.16605  [pdf, other

    cs.CL cs.AI cs.LG stat.ME

    CLEAR: Can Language Models Really Understand Causal Graphs?

    Authors: Sirui Chen, Mengying Xu, Kun Wang, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Chaochao Lu

    Abstract: Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we devel… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  28. arXiv:2406.16006  [pdf, other

    cs.LG cs.AI

    Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

    Authors: Erin J. Talvitie, Zilei Shao, Huiying Li, Jinghan Hu, Jacob Boerma, Rory Zhao, Xintong Wang

    Abstract: In model-based reinforcement learning, simulated experiences from the learned model are often treated as equivalent to experience from the real environment. However, when the model is inaccurate, it can catastrophically interfere with policy learning. Alternatively, the agent might learn about the model's accuracy and selectively use it only when it can provide reliable predictions. We empirically… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: To appear: Reinforcement Learning Conference (RLC), 2024

  29. arXiv:2406.10174  [pdf, other

    cs.CL

    Let the Poem Hit the Rhythm: Using a Byte-Based Transformer for Beat-Aligned Poetry Generation

    Authors: Mohamad Elzohbi, Richard Zhao

    Abstract: The intersection between poetry and music provides an interesting case for computational creativity, yet remains relatively unexplored. This paper explores the integration of poetry and music through the lens of beat patterns, investigating whether a byte-based language model can generate words that fit specific beat patterns within the context of poetry. Drawing on earlier studies, we developed a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, accepted for the 15th International Conference on Computational Creativity, ICCC'24

  30. arXiv:2406.01638  [pdf, other

    cs.LG cs.AI cs.CL

    TimeCMA: Towards LLM-Empowered Time Series Forecasting via Cross-Modality Alignment

    Authors: Chenxi Liu, Qianxiong Xu, Hao Miao, Sun Yang, Lingzheng Zhang, Cheng Long, Ziyue Li, Rui Zhao

    Abstract: The widespread adoption of scalable mobile sensing has led to large amounts of time series data for real-world applications. A fundamental application is multivariate time series forecasting (MTSF), which aims to predict future time series values based on historical observations. Existing MTSF methods suffer from limited parameterization and small-scale training data. Recently, Large language mode… ▽ More

    Submitted 13 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  31. arXiv:2406.00023  [pdf, other

    cs.CL

    Expert-Token Resonance: Redefining MoE Routing through Affinity-Driven Active Selection

    Authors: Jing Li, Zhijie Sun, Dachao Lin, Xuan He, Yi Lin, Binfan Zheng, Li Zeng, Rongqian Zhao, Xin Chen

    Abstract: Mixture-of-Experts (MoE) architectures have emerged as a paradigm-shifting approach for large language models (LLMs), offering unprecedented computational efficiency. However, these architectures grapple with challenges of token distribution imbalance and expert homogenization, impeding optimal semantic generalization. We introduce a novel framework that redefines MoE routing through affinity-driv… ▽ More

    Submitted 30 August, 2024; v1 submitted 23 May, 2024; originally announced June 2024.

  32. arXiv:2405.20693  [pdf, other

    eess.IV cs.CV

    R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction

    Authors: Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, Hongdong Li

    Abstract: 3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a p… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  33. arXiv:2405.20267  [pdf, other

    cs.CL

    Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions

    Authors: Ruochen Zhao, Wenxuan Zhang, Yew Ken Chia, Deli Zhao, Lidong Bing

    Abstract: As LLMs evolve on a daily basis, there is an urgent need for a trustworthy evaluation method that can provide robust evaluation results in a timely fashion. Currently, as static benchmarks are prone to contamination concerns, users tend to trust human voting platforms, such as Chatbot Arena. However, human annotations require extensive manual efforts. To provide an automatic, robust, and trustwort… ▽ More

    Submitted 12 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  34. arXiv:2405.18688  [pdf, other

    cs.LG cs.AI cs.CL

    Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation

    Authors: Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Ying Wen, Yaodong Yang, Bo Xu, Lei Han

    Abstract: Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems from the learning loop, which entails accurate reward learning compounded with value/policy learning, necessitating a considerable number of samples. To boost the… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  35. A Vlogger-augmented Graph Neural Network Model for Micro-video Recommendation

    Authors: Weijiang Lai, Beihong Jin, Beibei Li, Yiyuan Zheng, Rui Zhao

    Abstract: Existing micro-video recommendation models exploit the interactions between users and micro-videos and/or multi-modal information of micro-videos to predict the next micro-video a user will watch, ignoring the information related to vloggers, i.e., the producers of micro-videos. However, in micro-video scenarios, vloggers play a significant role in user-video interactions, since vloggers generally… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Journal ref: (2023) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track (pp. 684-699). Cham: Springer Nature Switzerland

  36. arXiv:2405.17152  [pdf, other

    cs.MA cs.AI

    CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

    Authors: Jingqing Ruan, Ziyue Li, Hua Wei, Haoyuan Jiang, Jiaming Lu, Xuantang Xiong, Hangyu Mao, Rui Zhao

    Abstract: Effective multi-intersection collaboration is pivotal for reinforcement-learning-based traffic signal control to alleviate congestion. Existing work mainly chooses neighboring intersections as collaborators. However, quite an amount of congestion, even some wide-range congestion, is caused by non-neighbors failing to collaborate. To address these issues, we propose to separate the collaborator sel… ▽ More

    Submitted 19 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  37. arXiv:2405.13532  [pdf, other

    cs.CV

    What Makes Good Few-shot Examples for Vision-Language Models?

    Authors: Zhaojun Guo, Jinghui Lu, Xuejing Liu, Rui Zhao, ZhenXing Qian, Fei Tan

    Abstract: Despite the notable advancements achieved by leveraging pre-trained vision-language (VL) models through few-shot tuning for downstream tasks, our detailed empirical study highlights a significant dependence of few-shot learning outcomes on the careful selection of training examples - a facet that has been previously overlooked in research. In this study, we delve into devising more effective strat… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 8 pages, 4 figures

  38. arXiv:2405.09593  [pdf, other

    cs.DB cs.AI

    SQL-to-Schema Enhances Schema Linking in Text-to-SQL

    Authors: Sun Yang, Qiong Su, Zhishuai Li, Ziyue Li, Hangyu Mao, Chenxi Liu, Rui Zhao

    Abstract: In sophisticated existing Text-to-SQL methods exhibit errors in various proportions, including schema-linking errors (incorrect columns, tables, or extra columns), join errors, nested errors, and group-by errors. Consequently, there is a critical need to filter out unnecessary tables and columns, directing the language models attention to relevant tables and columns with schema-linking, to reduce… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  39. arXiv:2405.09582  [pdf

    cs.CV eess.IV

    AD-Aligning: Emulating Human-like Generalization for Cognitive Domain Adaptation in Deep Learning

    Authors: Zhuoying Li, Bohua Wan, Cong Mu, Ruzhang Zhao, Shushan Qiu, Chao Yan

    Abstract: Domain adaptation is pivotal for enabling deep learning models to generalize across diverse domains, a task complicated by variations in presentation and cognitive nuances. In this paper, we introduce AD-Aligning, a novel approach that combines adversarial training with source-target domain alignment to enhance generalization capabilities. By pretraining with Coral loss and standard loss, AD-Align… ▽ More

    Submitted 21 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by 2024 5th International Conference on Electronic Communication and Artificial Intelligence

  40. arXiv:2405.05714  [pdf, other

    cs.CV cs.LG

    Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning

    Authors: Rui Zhao, Bin Shi, Jianfei Ruan, Tianze Pan, Bo Dong

    Abstract: In noisy label learning, estimating noisy class posteriors plays a fundamental role for developing consistent classifiers, as it forms the basis for estimating clean class posteriors and the transition matrix. Existing methods typically learn noisy class posteriors by training a classification model with noisy labels. However, when labels are incorrect, these models may be misled to overemphasize… ▽ More

    Submitted 2 July, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  41. arXiv:2405.02686  [pdf, other

    cs.CV cs.AI

    Boosting 3D Neuron Segmentation with 2D Vision Transformer Pre-trained on Natural Images

    Authors: Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Weidong Cai

    Abstract: Neuron reconstruction, one of the fundamental tasks in neuroscience, rebuilds neuronal morphology from 3D light microscope imaging data. It plays a critical role in analyzing the structure-function relationship of neurons in the nervous system. However, due to the scarcity of neuron datasets and high-quality SWC annotations, it is still challenging to develop robust segmentation methods for single… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 3 pages

  42. arXiv:2405.01439  [pdf, other

    cs.CV

    Improving Domain Generalization on Gaze Estimation via Branch-out Auxiliary Regularization

    Authors: Ruijie Zhao, Pinyan Tang, Sihui Luo

    Abstract: Despite remarkable advancements, mainstream gaze estimation techniques, particularly appearance-based methods, often suffer from performance degradation in uncontrolled environments due to variations in illumination and individual facial attributes. Existing domain adaptation strategies, limited by their need for target domain samples, may fall short in real-world applications. This letter introdu… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  43. arXiv:2405.00622  [pdf, other

    cs.CL cs.AI cs.LG

    Causal Evaluation of Language Models

    Authors: Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu

    Abstract: Causal reasoning is viewed as crucial for achieving human-level machine intelligence. Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning. In this work, we introduce Causal evaluation of Language Models (CaLM), which, to the best of our knowledge, is the first comprehensive ben… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 315 pages, 230 figures, 21 tables. Project website: https://opencausalab.github.io/CaLM

  44. arXiv:2404.17662  [pdf, other

    cs.CL

    PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games

    Authors: Qinglin Zhu, Runcong Zhao, Jinhua Du, Lin Gui, Yulan He

    Abstract: We propose PLAYER*, a novel framework that addresses the limitations of existing agent-based approaches built on Large Language Models (LLMs) in handling complex questions and understanding interpersonal relationships in dynamic environments. PLAYER* enhances path planning in Murder Mystery Games (MMGs) using an anytime sampling-based planner and a questioning-driven search framework. By equipping… ▽ More

    Submitted 17 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  45. arXiv:2404.17378  [pdf

    quant-ph cs.AI

    Quantum Adjoint Convolutional Layers for Effective Data Representation

    Authors: Ren-Xin Zhao, Shi Wang, Yaonan Wang

    Abstract: Quantum Convolutional Layer (QCL) is considered as one of the core of Quantum Convolutional Neural Networks (QCNNs) due to its efficient data feature extraction capability. However, the current principle of QCL is not as mathematically understandable as Classical Convolutional Layer (CCL) due to its black-box structure. Moreover, classical data mapping in many QCLs is inefficient. To this end, fir… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  46. arXiv:2404.16280  [pdf, ps, other

    cs.NE cs.AI cs.LG

    An Efficient Reconstructed Differential Evolution Variant by Some of the Current State-of-the-art Strategies for Solving Single Objective Bound Constrained Problems

    Authors: Sichen Tao, Ruihan Zhao, Kaiyu Wang, Shangce Gao

    Abstract: Complex single-objective bounded problems are often difficult to solve. In evolutionary computation methods, since the proposal of differential evolution algorithm in 1997, it has been widely studied and developed due to its simplicity and efficiency. These developments include various adaptive strategies, operator improvements, and the introduction of other search methods. After 2014, research ba… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  47. arXiv:2404.13767  [pdf, other

    cs.RO cs.AI cs.CV

    Autonomous Robot for Disaster Mapping and Victim Localization

    Authors: Michael Potter, Rahil Bhowal, Richard Zhao, Anuj Patel, Jingming Cheng

    Abstract: In response to the critical need for effective reconnaissance in disaster scenarios, this research article presents the design and implementation of a complete autonomous robot system using the Turtlebot3 with Robotic Operating System (ROS) Noetic. Upon deployment in closed, initially unknown environments, the system aims to generate a comprehensive map and identify any present 'victims' using Apr… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Class final project for Northeastern University EECE 5550 Mobile Robotics Course

  48. arXiv:2404.12090  [pdf, other

    cs.AI

    X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner

    Authors: Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, Jingqing Ruan, Jiaming Lu, Hangyu Mao, Rui Zhao

    Abstract: The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  49. arXiv:2404.11945  [pdf, other

    cs.RO

    Terrain-Aware Stride-Level Trajectory Forecasting for a Powered Hip Exoskeleton via Vision and Kinematics Fusion

    Authors: Ruoqi Zhao, Xingbang Yan, Yubo Fan

    Abstract: Powered hip exoskeletons have shown the ability for locomotion assistance during treadmill walking. However, providing suitable assistance in real-world walking scenarios which involve changing terrain remains challenging. Recent research suggests that forecasting the lower limb joint's angles could provide target trajectories for exoskeletons and prostheses, and the performance could be improved… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 6 pages, submitted to IEEE RA-L, under review. This work has been submitted to the IEEE Robotics and Automation Letters (RA-L) for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  50. arXiv:2404.11895  [pdf, other

    cs.CV

    FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models

    Authors: Wei Wu, Qingnan Fan, Shuai Qin, Hong Gu, Ruoyu Zhao, Antoni B. Chan

    Abstract: Precise image editing with text-to-image models has attracted increasing interest due to their remarkable generative capabilities and user-friendly nature. However, such attempts face the pivotal challenge of misalignment between the intended precise editing target regions and the broader area impacted by the guidance in practice. Despite excellent methods leveraging attention mechanisms that have… ▽ More

    Submitted 13 August, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by ECCV-2024