Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,037 results for author: Zhao, D

.
  1. arXiv:2407.02345  [pdf, other

    cs.CL

    MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space

    Authors: Yihong Tang, Bo Wang, Dongming Zhao, Xiaojia Jin, Jijun Zhang, Ruifang He, Yuexian Hou

    Abstract: Personalized Dialogue Generation (PDG) aims to create coherent responses according to roles or personas. Traditional PDG relies on external role data, which can be scarce and raise privacy concerns. Approaches address these issues by extracting role information from dialogue history, which often fail to generically model roles in continuous space. To overcome these limitations, we introduce a nove… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2406.19645  [pdf, other

    cs.NE

    Directly Training Temporal Spiking Neural Network with Sparse Surrogate Gradient

    Authors: Yang Li, Feifei Zhao, Dongcheng Zhao, Yi Zeng

    Abstract: Brain-inspired Spiking Neural Networks (SNNs) have attracted much attention due to their event-based computing and energy-efficient features. However, the spiking all-or-none nature has prevented direct training of SNNs for various applications. The surrogate gradient (SG) algorithm has recently enabled spiking neural networks to shine in neuromorphic hardware. However, introducing surrogate gradi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2406.16338  [pdf, other

    cs.CV

    VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

    Authors: Yuxuan Wang, Yueqian Wang, Dongyan Zhao, Cihang Xie, Zilong Zheng

    Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have extended their capabilities to video understanding. Yet, these models are often plagued by "hallucinations", where irrelevant or nonsensical content is generated, deviating from the actual video context. This work introduces VideoHallucer, the first comprehensive benchmark for hallucination detection in large video-language model… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.15743  [pdf, other

    cs.SE

    CasModaTest: A Cascaded and Model-agnostic Self-directed Framework for Unit Test Generation

    Authors: Chao Ni, Xiaoya Wang, Liushan Chen, Dehai Zhao, Zhengong Cai, Shaohua Wang, Xiaohu Yang

    Abstract: Though many machine learning (ML)-based unit testing generation approaches have been proposed and indeed achieved remarkable performance, they still have several limitations in effectiveness and practical usage. More precisely, existing ML-based approaches (1) generate partial content of a unit test, mainly focusing on test oracle generation; (2) mismatch the test prefix with the test oracle seman… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages, 7 figures

  5. arXiv:2406.14833  [pdf, other

    cs.CL

    Efficient Continual Pre-training by Mitigating the Stability Gap

    Authors: Yiduo Guo, Jie Fu, Huishuai Zhang, Dongyan Zhao, Yikang Shen

    Abstract: Continual pre-training has increasingly become the predominant approach for adapting Large Language Models (LLMs) to new domains. This process involves updating the pre-trained LLM with a corpus from a new domain, resulting in a shift in the training distribution. To study the behavior of LLMs during this shift, we measured the model's performance throughout the continual pre-training process. we… ▽ More

    Submitted 27 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.14045  [pdf, other

    cs.LG cs.AI

    Understanding Different Design Choices in Training Large Time Series Models

    Authors: Yu-Neng Chuang, Songchen Li, Jiayi Yuan, Guanchu Wang, Kwei-Herng Lai, Leisheng Yu, Sirui Ding, Chia-Yuan Chang, Qiaoyu Tan, Daochen Zha, Xia Hu

    Abstract: Inspired by Large Language Models (LLMs), Time Series Forecasting (TSF), a long-standing task in time series analysis, is undergoing a transition towards Large Time Series Models (LTSMs), aiming to train universal transformer-based models for TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datase… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  7. arXiv:2406.13876  [pdf, other

    stat.ME

    An Empirical Bayes Jackknife Regression Framework for Covariance Matrix Estimation

    Authors: Huqin Xin, Sihai Dave Zhao

    Abstract: Covariance matrix estimation, a classical statistical topic, poses significant challenges when the sample size is comparable to or smaller than the number of features. In this paper, we frame covariance matrix estimation as a compound decision problem and apply an optimal decision rule to estimate covariance parameters. To approximate this rule, we introduce an algorithm that integrates jackknife… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages, 3 figures

    MSC Class: 62C25

  8. arXiv:2406.11945  [pdf, other

    cs.LG cs.AI cs.IR

    GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models

    Authors: Yi Fang, Dongzhe Fan, Daochen Zha, Qiaoyu Tan

    Abstract: This work studies self-supervised graph learning for text-attributed graphs (TAGs) where nodes are represented by textual attributes. Unlike traditional graph contrastive methods that perturb the numerical feature space and alter the graph's topological structure, we aim to improve view generation through language supervision. This is driven by the prevalence of textual attributes in real applicat… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.10950  [pdf, other

    cs.CL

    E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models

    Authors: Zhenyu Zhang, Bingguang Hao, Jinpeng Li, Zekai Zhang, Dongyan Zhao

    Abstract: Most large language models (LLMs) are sensitive to prompts, and another synonymous expression or a typo may lead to unexpected results for the model. Composing an optimal prompt for a specific demand lacks theoretical support and relies entirely on human experimentation, which poses a considerable obstacle to popularizing generative artificial intelligence. However, there is no systematic analysis… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  10. arXiv:2406.10928  [pdf, other

    cs.CR cs.AI cs.NI

    Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

    Authors: Jingyu Xiao, Zhiyao Xu, Qingsong Zou, Qing Li, Dan Zhao, Dong Fang, Ruoyu Li, Wenxin Tang, Kang Li, Xudong Zuo, Penghui Hu, Yong Jiang, Zixuan Weng, Michael R. Lyv

    Abstract: Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effec… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  11. arXiv:2406.08310  [pdf, other

    cs.LG

    GraphFM: A Comprehensive Benchmark for Graph Foundation Model

    Authors: Yuhao Xu, Xinqi Liu, Keyu Duan, Yi Fang, Yu-Neng Chuang, Daochen Zha, Qiaoyu Tan

    Abstract: Foundation Models (FMs) serve as a general class for the development of artificial intelligence systems, offering broad potential for generalization across a spectrum of downstream tasks. Despite extensive research into self-supervised learning as the cornerstone of FMs, several outstanding issues persist in Graph Foundation Models that rely on graph self-supervised learning, namely: 1) Homogeniza… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  12. arXiv:2406.07476  [pdf, other

    cs.CV cs.CL

    VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

    Authors: Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing

    Abstract: In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks. Building upon its predecessor, VideoLLaMA 2 incorporates a tailor-made Spatial-Temporal Convolution (STC) connector, which effectively captures the intricate spatial and temporal dynamics of video data… ▽ More

    Submitted 17 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: ZC, SL, HZ, YX, and XL contributed equally to this project

  13. arXiv:2406.06813  [pdf, other

    cs.CV

    Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation

    Authors: Dong Zhao, Shuang Wang, Qi Zang, Licheng Jiao, Nicu Sebe, Zhun Zhong

    Abstract: We study source-free unsupervised domain adaptation (SFUDA) for semantic segmentation, which aims to adapt a source-trained model to the target domain without accessing the source data. Many works have been proposed to address this challenging problem, among which uncertainty-based self-training is a predominant approach. However, without comprehensive denoising mechanisms, they still largely fall… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 2024 Conference on Computer Vision and Pattern Recognition

    Journal ref: (2024 Conference on Computer Vision and Pattern Recognition)

  14. arXiv:2406.06407  [pdf, other

    cs.LG cs.CY

    A Taxonomy of Challenges to Curating Fair Datasets

    Authors: Dora Zhao, Morgan Klaus Scheuerman, Pooja Chitre, Jerone T. A. Andrews, Georgia Panagiotidou, Shawn Walker, Kathleen H. Pine, Alice Xiang

    Abstract: Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fa… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  15. arXiv:2406.04627  [pdf, other

    cs.LG cs.AI

    Denoising-Aware Contrastive Learning for Noisy Time Series

    Authors: Shuang Zhou, Daochen Zha, Xiao Shen, Xiao Huang, Rui Zhang, Fu-Lai Chung

    Abstract: Time series self-supervised learning (SSL) aims to exploit unlabeled data for pre-training to mitigate the reliance on labels. Despite the great success in recent years, there is limited discussion on the potential noise in the time series, which can severely impair the performance of existing SSL methods. To mitigate the noise, the de facto strategy is to apply conventional denoising methods befo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

  16. arXiv:2406.03559  [pdf, ps, other

    cs.CR cs.DB

    Stateless and Non-Interactive Order-Preserving Encryption for Outsourced Databases through Subtractive Homomorphism

    Authors: Dongfang Zhao

    Abstract: Order-preserving encryption (OPE) has been extensively studied for more than two decades in the context of outsourced databases because OPE is a key enabling technique to allow the outsourced database servers to sort encrypted tuples in order to build indexes, complete range queries, and so forth. The state-of-the-art OPE schemes require (i) a stateful client -- implying that the client manages th… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  17. arXiv:2406.03075  [pdf, other

    cs.CL

    Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

    Authors: Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, Rui Yan

    Abstract: The advent of large language models (LLMs) has facilitated the development of natural language text generation. It also poses unprecedented challenges, with content hallucination emerging as a significant concern. Existing solutions often involve expensive and complex interventions during the training process. Moreover, some approaches emphasize problem disassembly while neglecting the crucial val… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 18 pages, 3 figures

  18. arXiv:2406.02880  [pdf, other

    cs.CV cs.AI

    Controllable Talking Face Generation by Implicit Facial Keypoints Editing

    Authors: Dong Zhao, Jiaying Shi, Wenjun Li, Shudong Wang, Shenghui Xu, Zhaoming Pan

    Abstract: Audio-driven talking face generation has garnered significant interest within the domain of digital human research. Existing methods are encumbered by intricate model architectures that are intricately dependent on each other, complicating the process of re-editing image or video inputs. In this work, we present ControlTalk, a talking face generation method to control face expression deformation b… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  19. arXiv:2406.02266  [pdf

    cs.CL

    Enhancing Retrieval-Augmented LMs with a Two-stage Consistency Learning Compressor

    Authors: Chuankai Xu, Dongming Zhao, Bo Wang, Hanwen Xing

    Abstract: Despite the prevalence of retrieval-augmented language models (RALMs), the seamless integration of these models with retrieval mechanisms to enhance performance in document-based tasks remains challenging. While some post-retrieval processing Retrieval-Augmented Generation (RAG) methods have achieved success, most still lack the ability to distinguish pertinent from extraneous information, leading… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  20. arXiv:2406.01587  [pdf, other

    cs.RO

    PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

    Authors: Yupeng Zheng, Zebin Xing, Qichao Zhang, Bu Jin, Pengfei Li, Yuhang Zheng, Zhongpu Xia, Kun Zhan, Xianpeng Lang, Yaran Chen, Dongbin Zhao

    Abstract: Vehicle motion planning is an essential component of autonomous driving technology. Current rule-based vehicle motion planning methods perform satisfactorily in common scenarios but struggle to generalize to long-tailed situations. Meanwhile, learning-based methods have yet to achieve superior performance over rule-based approaches in large-scale closed-loop scenarios. To address these issues, we… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  21. arXiv:2406.01284  [pdf

    physics.med-ph cs.HC

    Extraction of Weak Surface Diaphragmatic Electromyogram Using Modified Progressive FastICA Peel-Off

    Authors: Yao Li, Dongsheng Zhao, Haowen Zhao, Xu Zhang, Min Shao

    Abstract: Diaphragmatic electromyogram (EMGdi) contains crucial information about human respiration therefore can be used to monitor respiratory condition. Although it is practical to record EMGdi noninvasively and conveniently by placing surface electrodes over chest skin, extraction of such weak surface EMGdi (sEMGdi) from great noisy environment is a challenging task, limiting its clinical use compared w… ▽ More

    Submitted 28 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  22. arXiv:2406.00420  [pdf

    physics.optics physics.app-ph

    Realization of type-II double-zero-index photonic crystals

    Authors: Zebin Zhu, Dong Zhao, Ziyao Wang, Xucheng Yang, Liyong Jiang, Zhen Gao

    Abstract: Some photonic crystals (PCs) with Dirac-like conical dispersions exhibit the property of double zero refractive index (that is, both epsilon and mu near zero (EMNZ)), wherein the electromagnetic waves have an infinite effective wavelength and do not experience any spatial phase change. The Dirac-like cones that support EMNZ are previously thought to present only at the center of the Brillouin zone… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 38 pages, 13 figures

  23. arXiv:2405.20267  [pdf, other

    cs.CL

    Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions

    Authors: Ruochen Zhao, Wenxuan Zhang, Yew Ken Chia, Deli Zhao, Lidong Bing

    Abstract: As LLMs evolve on a daily basis, there is an urgent need for a trustworthy evaluation method that can provide robust evaluation results in a timely fashion. Currently, as static benchmarks are prone to contamination concerns, users tend to trust human voting platforms, such as Chatbot Arena. However, human annotations require extensive manual efforts. To provide an automatic, robust, and trustwort… ▽ More

    Submitted 12 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  24. arXiv:2405.19769  [pdf, other

    cs.CV

    All-In-One Medical Image Restoration via Task-Adaptive Routing

    Authors: Zhiwen Yang, Haowei Chen, Ziniu Qian, Yang Yi, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

    Abstract: Although single-task medical image restoration (MedIR) has witnessed remarkable success, the limited generalizability of these methods poses a substantial obstacle to wider application. In this paper, we focus on the task of all-in-one medical image restoration, aiming to address multiple distinct MedIR tasks with a single universal model. Nonetheless, due to significant differences between differ… ▽ More

    Submitted 28 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: This article has been early accepted by MICCAI 2024

  25. arXiv:2405.18959  [pdf, other

    cs.CV cs.MM

    Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval

    Authors: Rui Yang, Shuang Wang, Yingping Han, Yuanheng Li, Dong Zhao, Dou Quan, Yanhe Guo, Licheng Jiao

    Abstract: Remote Sensing Image-Text Retrieval (RSITR) is pivotal for knowledge services and data mining in the remote sensing (RS) domain. Considering the multi-scale representations in image content and text vocabulary can enable the models to learn richer representations and enhance retrieval. Current multi-scale RSITR approaches typically align multi-scale fused image features with text features, but ove… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 16 pages, 9 figures

  26. arXiv:2405.18880  [pdf, other

    cs.CV

    EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision

    Authors: Yiting Dong, Xiang He, Guobin Shen, Dongcheng Zhao, Yang Li, Yi Zeng

    Abstract: Event data captured by Dynamic Vision Sensors (DVS) offers a unique approach to visual processing that differs from traditional video capture, showcasing its efficiency in dynamic and real-time scenarios. Despite advantages such as high temporal resolution and low energy consumption, the application of event data faces challenges due to limited dataset size and diversity. To address this, we devel… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  27. arXiv:2405.17337  [pdf, other

    cs.CL cs.AI

    Cost-efficient Knowledge-based Question Answering with Large Language Models

    Authors: Junnan Dong, Qinggang Zhang, Chuang Zhou, Hao Chen, Daochen Zha, Xiao Huang

    Abstract: Knowledge-based question answering (KBQA) is widely used in many scenarios that necessitate domain knowledge. Large language models (LLMs) bring opportunities to KBQA, while their costs are significantly higher and absence of domain-specific knowledge during pre-training. We are motivated to combine LLMs and prior small models on knowledge graphs (KGMs) for both inferential accuracy and cost savin… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  28. arXiv:2405.15925  [pdf

    eess.IV cs.CV cs.LG

    MUCM-Net: A Mamba Powered UCM-Net for Skin Lesion Segmentation

    Authors: Chunyu Yuan, Dongfang Zhao, Sos S. Agaian

    Abstract: Skin lesion segmentation is key for early skin cancer detection. Challenges in automatic segmentation from dermoscopic images include variations in color, texture, and artifacts of indistinct lesion boundaries. Deep learning methods like CNNs and U-Net have shown promise in addressing these issues. To further aid early diagnosis, especially on mobile devices with limited computing power, we presen… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 11 pages, 8 figures, journal paper (under review)

  29. arXiv:2405.14488  [pdf, other

    cs.CL

    MoGU: A Framework for Enhancing Safety of Open-Sourced LLMs While Preserving Their Usability

    Authors: Yanrui Du, Sendong Zhao, Danyang Zhao, Ming Ma, Yuhan Chen, Liangyu Huo, Qing Yang, Dongliang Xu, Bing Qin

    Abstract: Large Language Models (LLMs) are increasingly deployed in various applications. As their usage grows, concerns regarding their safety are rising, especially in maintaining harmless responses when faced with malicious instructions. Many defense strategies have been developed to enhance the safety of LLMs. However, our research finds that existing defense strategies lead LLMs to predominantly adopt… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  30. arXiv:2405.14474  [pdf, other

    cs.NE

    Time Cell Inspired Temporal Codebook in Spiking Neural Networks for Enhanced Image Generation

    Authors: Linghao Feng, Dongcheng Zhao, Sicheng Shen, Yiting Dong, Guobin Shen, Yi Zeng

    Abstract: This paper presents a novel approach leveraging Spiking Neural Networks (SNNs) to construct a Variational Quantized Autoencoder (VQ-VAE) with a temporal codebook inspired by hippocampal time cells. This design captures and utilizes temporal dependencies, significantly enhancing the generative capabilities of SNNs. Neuroscientific research has identified hippocampal "time cells" that fire sequentia… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  31. arXiv:2405.14401  [pdf, ps, other

    math.FA

    Roots and Logarithms of Multipliers

    Authors: Jingbo Xia, Congquan Yan, Danjun Zhao, Jingming Zhu

    Abstract: By now it is a well-known fact that if $f$ is a multiplier for the Drury-Arveson space $H^2_n$, and if there is a $c>0$ such that $|f(z)|\geq c$ for every $z\in B$, then the reciprocal function 1/f is also a multiplier for $H^2_n$. We show that for such an $f$ and for every $t\in \mathbb{R}$, $f^t$ is also a multiplier for $H^2_n$. We do so by deriving a differentiation formula for $R^m(f^th)$.Mor… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  32. arXiv:2405.14293  [pdf, other

    cs.GT

    Sybil-Proof Mechanism for Information Propagation with Budgets

    Authors: Junjie Zheng, Xu Ge, Bin Li, Dengji Zhao

    Abstract: This paper examines the problem of distributing rewards on social networks to improve the efficiency of crowdsourcing tasks for sponsors. To complete the tasks efficiently, we aim to design reward mechanisms that incentivize early-joining agents to invite more participants to the tasks. Nonetheless, participants could potentially engage in strategic behaviors, e.g., not inviting others to the task… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  33. arXiv:2405.13810  [pdf, other

    cs.LG cs.AI

    Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

    Authors: Xin Cheng, Xiuying Chen, Shuqi Li, Di Luo, Xun Wang, Dongyan Zhao, Rui Yan

    Abstract: Time series prediction is crucial for understanding and forecasting complex dynamics in various domains, ranging from finance and economics to climate and healthcare. Based on Transformer architecture, one approach involves encoding multiple variables from the same timestamp into a single temporal token to model global dependencies. In contrast, another approach embeds the time points of individua… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  34. arXiv:2405.13792  [pdf, other

    cs.CL cs.AI cs.IR

    xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

    Authors: Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao

    Abstract: This paper introduces xRAG, an innovative context compression method tailored for retrieval-augmented generation. xRAG reinterprets document embeddings in dense retrieval--traditionally used solely for retrieval--as features from the retrieval modality. By employing a modality fusion methodology, xRAG seamlessly integrates these embeddings into the language model representation space, effectively… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  35. arXiv:2405.11740  [pdf, other

    cs.LG cs.AI

    Learning Future Representation with Synthetic Observations for Sample-efficient Reinforcement Learning

    Authors: Xin Liu, Yaran Chen, Dongbin Zhao

    Abstract: In visual Reinforcement Learning (RL), upstream representation learning largely determines the effect of downstream policy learning. Employing auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the sample efficiency and performance of downstream RL. Prior advanced auxiliary tasks all focus on how to extract as much information as possible from… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  36. arXiv:2405.11718  [pdf, other

    cs.LG

    Feasibility Consistent Representation Learning for Safe Reinforcement Learning

    Authors: Zhepeng Cen, Yihang Yao, Zuxin Liu, Ding Zhao

    Abstract: In the field of safe reinforcement learning (RL), finding a balance between satisfying safety constraints and optimizing reward performance presents a significant challenge. A key obstacle in this endeavor is the estimation of safety constraints, which is typically more difficult than estimating a reward metric due to the sparse nature of the constraint signals. To address this issue, we introduce… ▽ More

    Submitted 13 June, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  37. arXiv:2405.11265  [pdf, other

    cs.CL cs.AI

    EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models

    Authors: Yu Huang, Liang Guo, Wanqian Guo, Zhe Tao, Yang Lv, Zhihao Sun, Dongfang Zhao

    Abstract: In the field of environmental science, it is crucial to have robust evaluation metrics for large language models to ensure their efficacy and accuracy. We propose EnviroExam, a comprehensive evaluation method designed to assess the knowledge of large language models in the field of environmental science. EnviroExam is based on the curricula of top international universities, covering undergraduate… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  38. arXiv:2405.10576  [pdf, other

    cs.RO

    An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems

    Authors: Jiyue Tao, Yunsong Zhang, Sunil Kumar Rajendran, Feitian Zhang, Dexin Zhao, Tongsheng Shen

    Abstract: Robotic systems driven by artificial muscles present unique challenges due to the nonlinear dynamics of actuators and the complex designs of mechanical structures. Traditional model-based controllers often struggle to achieve desired control performance in such systems. Deep reinforcement learning (DRL), a trending machine learning technique widely adopted in robot control, offers a promising alte… ▽ More

    Submitted 7 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  39. arXiv:2405.10467  [pdf, other

    cs.AI cs.SE

    Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents

    Authors: Yue Liu, Sin Kit Lo, Qinghua Lu, Liming Zhu, Dehai Zhao, Xiwei Xu, Stefan Harrer, Jon Whittle

    Abstract: Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking… ▽ More

    Submitted 24 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  40. arXiv:2405.06624  [pdf, other

    cs.AI

    Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

    Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum

    Abstract: Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these appro… ▽ More

    Submitted 17 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

  41. arXiv:2405.04390  [pdf, other

    cs.CV

    DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving

    Authors: Chen Min, Dawei Zhao, Liang Xiao, Jian Zhao, Xinli Xu, Zheng Zhu, Lei Jin, Jianshu Li, Yulan Guo, Junliang Xing, Liping Jing, Yiming Nie, Bin Dai

    Abstract: Vision-centric autonomous driving has recently raised wide attention due to its lower cost. Pre-training is essential for extracting a universal representation. However, current vision-centric pre-training typically relies on either 2D or 3D pre-text tasks, overlooking the temporal characteristics of autonomous driving as a 4D scene understanding task. In this paper, we address this challenge by i… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  42. arXiv:2405.03534  [pdf, other

    cs.RO cs.AI cs.LG cs.NE

    Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer

    Authors: Xingyu Liu, Deepak Pathak, Ding Zhao

    Abstract: We investigate the problem of transferring an expert policy from a source robot to multiple different robots. To solve this problem, we propose a method named $Meta$-$Evolve$ that uses continuous robot evolution to efficiently transfer the policy to each target robot through a set of tree-structured evolutionary robot sequences. The robot evolution tree allows the robot evolution paths to be share… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: ICLR 2024

  43. arXiv:2405.03520  [pdf, other

    cs.CV

    Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

    Authors: Zheng Zhu, Xiaofeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang

    Abstract: General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems. Recently, the emergence of the Sora model has attained significant attention due to its remarkable simulation capabilities, which exhibits an incipient comprehension of physical law… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: This survey will be regularly updated at: https://github.com/GigaAI-research/General-World-Models-Survey

  44. arXiv:2405.03401  [pdf, other

    cs.LG cs.AI

    E2GNN: Efficient Graph Neural Network Ensembles for Semi-Supervised Classification

    Authors: Xin Zhang, Daochen Zha, Qiaoyu Tan

    Abstract: This work studies ensemble learning for graph neural networks (GNNs) under the popular semi-supervised setting. Ensemble learning has shown superiority in improving the accuracy and robustness of traditional machine learning by combining the outputs of multiple weak learners. However, adopting a similar idea to integrate different GNN models is challenging because of two reasons. First, GNN is not… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  45. arXiv:2404.19438  [pdf, other

    cs.NE

    Neuro-Vision to Language: Enhancing Visual Reconstruction and Language Interaction through Brain Recordings

    Authors: Guobin Shen, Dongcheng Zhao, Xiang He, Linghao Feng, Yiting Dong, Jihang Wang, Qian Zhang, Yi Zeng

    Abstract: Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks. Our framework integrates 3D brain structures with visual semantics using a… ▽ More

    Submitted 22 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  46. arXiv:2404.17823  [pdf, other

    eess.SP

    Performance Analysis for Downlink Transmission in Multi-Connectivity Cellular V2X Networks

    Authors: Luofang Jiao, Jiwei Zhao, Yunting Xu, Tianqi Zhang, Haibo Zhou, Dongmei Zhao

    Abstract: With the ever-increasing number of connected vehicles in the fifth-generation mobile communication networks (5G) and beyond 5G (B5G), ensuring the reliability and high-speed demand of cellular vehicle-to-everything (C-V2X) communication in scenarios where vehicles are moving at high speeds poses a significant challenge.Recently, multi-connectivity technology has become a promising network access p… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 13 pages,14 figures. IEEE Internet of Things Journal, 2023

  47. arXiv:2404.17684  [pdf, other

    cs.RO cs.LG

    Generalize by Touching: Tactile Ensemble Skill Transfer for Robotic Furniture Assembly

    Authors: Haohong Lin, Radu Corcodel, Ding Zhao

    Abstract: Furniture assembly remains an unsolved problem in robotic manipulation due to its long task horizon and nongeneralizable operations plan. This paper presents the Tactile Ensemble Skill Transfer (TEST) framework, a pioneering offline reinforcement learning (RL) approach that incorporates tactile feedback in the control loop. TEST's core design is to learn a skill transition model for high-level pla… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  48. arXiv:2404.16425  [pdf, other

    astro-ph.HE

    Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

    Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

    Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 41 pages, 8 figures, 7 tables

  49. arXiv:2404.16299  [pdf, ps, other

    gr-qc

    Conformal transformation of f(Q) gravity and its cosmological perturbations

    Authors: Dehao Zhao

    Abstract: Symmetric teleparallel gravity (STG) is a gravity theory which takes non-metricity tensor to describe gravity effects. In the STG framework, we study the conformal equivalent scalar-tensor theory of f(Q) model and calculate the cosmological linear perturbations of the conformal transformed action. We confirm the result already present in references that f(Q) gravity shows different degrees of free… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 14 pages, no figure

  50. arXiv:2404.14934  [pdf, other

    cs.MM cs.CV cs.HC

    G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition

    Authors: Kaikai Deng, Dong Zhao, Wenxin Zheng, Yue Ling, Kangwen Yin, Huadong Ma

    Abstract: Millimeter wave radar is gaining traction recently as a promising modality for enabling pervasive and privacy-preserving gesture recognition. However, the lack of rich and fine-grained radar datasets hinders progress in developing generalized deep learning models for gesture recognition across various user postures (e.g., standing, sitting), positions, and scenes. To remedy this, we resort to desi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 18 pages, 29 figures