Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,193 results for author: He, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02061  [pdf, other

    cs.RO

    LiDAR-based HD Map Localization using Semantic Generalized ICP with Road Marking Detection

    Authors: Yansong Gong, Xinglian Zhang, Jingyi Feng, Xiao He, Dan Zhang

    Abstract: In GPS-denied scenarios, a robust environmental perception and localization system becomes crucial for autonomous driving. In this paper, a LiDAR-based online localization system is developed, incorporating road marking detection and registration on a high-definition (HD) map. Within our system, a road marking detection approach is proposed with real-time performance, in which an adaptive segmenta… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.02037  [pdf, other

    cs.NI

    Saving Private WAN: Using Internet Paths to Offload WAN Traffic in Conferencing Services

    Authors: Bhaskar Kataria, Palak LNU, Rahul Bothra, Rohan Gandhi, Debopam Bhattacherjee, Venkata N. Padmanabhan, Irena Atov, Sriraam Ramakrishnan, Somesh Chaturmohta, Chakri Kotipalli, Rui Liang, Ken Sueda, Xin He, Kevin Hinton

    Abstract: Large-scale video conferencing services incur significant network cost while serving surging global demands. Our work systematically explores the opportunity to offload a fraction of this traffic to the Internet, a cheaper routing option offered already by cloud providers, from WAN without drop in application performance. First, with a large-scale latency measurement study with 3.5 million data po… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2407.00337  [pdf, other

    cs.CE cs.LG math.NA

    WgLaSDI: Weak-Form Greedy Latent Space Dynamics Identification

    Authors: Xiaolong He, April Tran, David M. Bortz, Youngsoo Choi

    Abstract: The parametric greedy latent space dynamics identification (gLaSDI) framework has demonstrated promising potential for accurate and efficient modeling of high-dimensional nonlinear physical systems. However, it remains challenging to handle noisy data. To enhance robustness against noise, we incorporate the weak-form estimation of nonlinear dynamics (WENDy) into gLaSDI. In the proposed weak-form g… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  4. arXiv:2406.19055  [pdf, other

    cs.CV

    SimpleFusion: A Simple Fusion Framework for Infrared and Visible Images

    Authors: Ming Chen, Yuxuan Cheng, Xinwei He, Xinyue Wang, Yan Aze, Jinhai Xiang

    Abstract: Integrating visible and infrared images into one high-quality image, also known as visible and infrared image fusion, is a challenging yet critical task for many downstream vision tasks. Most existing works utilize pretrained deep neural networks or design sophisticated frameworks with strong priors for this task, which may be unsuitable or lack flexibility. This paper presents SimpleFusion, a sim… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: code:https://github.com/hxwxss/SimpleFusion-A-Simple-Fusion-Framework-for-Infrared-and-Visible-Images

  5. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  6. arXiv:2406.16374  [pdf, other

    cs.CL

    KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning

    Authors: Dongyang Li, Taolin Zhang, Longtao Huang, Chengyu Wang, Xiaofeng He, Hui Xue

    Abstract: Knowledge-enhanced pre-trained language models (KEPLMs) leverage relation triples from knowledge graphs (KGs) and integrate these external data sources into language models via self-supervised learning. Previous works treat knowledge enhancement as two independent operations, i.e., knowledge injection and knowledge integration. In this paper, we propose to learn Knowledge-Enhanced language represe… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  7. arXiv:2406.16372  [pdf, other

    cs.CL

    UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding

    Authors: Dongyang Li, Taolin Zhang, Jiali Deng, Longtao Huang, Chengyu Wang, Xiaofeng He, Hui Xue

    Abstract: Cross-lingual representation learning transfers knowledge from resource-rich data to resource-scarce ones to improve the semantic understanding abilities of different languages. However, previous works rely on shallow unsupervised data generated by token surface matching, regardless of the global context-aware semantics of the surrounding text tokens. In this paper, we propose an Unsupervised Pseu… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  8. arXiv:2406.16367  [pdf, other

    cs.IR

    On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models

    Authors: Dongyang Li, Junbing Yan, Taolin Zhang, Chengyu Wang, Xiaofeng He, Longtao Huang, Hui Xue, Jun Huang

    Abstract: Retrieval augmented generation (RAG) exhibits outstanding performance in promoting the knowledge capabilities of large language models (LLMs) with retrieved documents related to user queries. However, RAG only focuses on improving the response quality of LLMs via enhancing queries indiscriminately with retrieved information, paying little attention to what type of knowledge LLMs really need to ans… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  9. arXiv:2406.15782  [pdf, other

    cs.SC cs.LO

    A Local Search Algorithm for MaxSMT(LIA)

    Authors: Xiang He, Bohan Li, Mengyu Zhao, Shaowei Cai

    Abstract: MaxSAT modulo theories (MaxSMT) is an important generalization of Satisfiability modulo theories (SMT) with various applications. In this paper, we focus on MaxSMT with the background theory of Linear Integer Arithmetic, denoted as MaxSMT(LIA). We design the first local search algorithm for MaxSMT(LIA) called PairLS, based on the following novel ideas. A novel operator called pairwise operator is… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  10. arXiv:2406.15655  [pdf, other

    cs.DB

    ProBE: Proportioning Privacy Budget for Complex Exploratory Decision Support

    Authors: Nada Lahjouji, Sameera Ghayyur, Xi He, Sharad Mehrotra

    Abstract: This paper studies privacy in the context of complex decision support queries composed of multiple conditions on different aggregate statistics combined using disjunction and conjunction operators. Utility requirements for such queries necessitate the need for private mechanisms that guarantee a bound on the false negative and false positive errors. This paper formally defines complex decision sup… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  11. arXiv:2406.15252  [pdf, other

    cs.CV cs.AI

    VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

    Authors: Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen

    Abstract: The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-prov… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  12. arXiv:2406.15189  [pdf, other

    cs.LG

    Causal Learning in Biomedical Applications

    Authors: Petr Ryšavý, Xiaoyu He, Jakub Mareček

    Abstract: We present a benchmark for methods in causal learning. Specifically, we consider training a rich class of causal models from time-series data, and we suggest the use of the Krebs cycle and models of metabolism more broadly.

    Submitted 21 June, 2024; originally announced June 2024.

  13. arXiv:2406.14482  [pdf, other

    cs.CV

    Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

    Authors: Xinyi Ying, Chao Xiao, Ruojing Li, Xu He, Boyang Li, Zhaoxu Li, Yingqian Wang, Mingyuan Hu, Qingyu Xu, Zaiping Lin, Miao Li, Shilin Zhou, Wei An, Weidong Sheng, Li Liu

    Abstract: Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  14. arXiv:2406.13939  [pdf, other

    cs.CV

    2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

    Authors: Bin Cao, Yisi Zhang, Xuanxu Lin, Xingjian He, Bo Zhao, Jing Liu

    Abstract: Motion Expression guided Video Segmentation is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions. Unlike the previous referring video object segmentation (RVOS), this task focuses more on the motion in video content for language-guided video object segmentation, requiring an enhanced ability to model longer temporal, moti… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  15. arXiv:2406.09739  [pdf, other

    cs.CV

    Decoupling Forgery Semantics for Generalizable Deepfake Detection

    Authors: Wei Ye, Xinan He, Feng Ding

    Abstract: In this paper, we propose a novel method for detecting DeepFakes, enhancing the generalization of detection through semantic decoupling. There are now multiple DeepFake forgery technologies that not only possess unique forgery semantics but may also share common forgery semantics. The unique forgery semantics and irrelevant content semantics may promote over-fitting and hamper generalization for D… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  16. arXiv:2406.09321  [pdf, other

    cs.CR cs.AI cs.CL

    JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models

    Authors: Delong Ran, Jinyuan Liu, Yichen Gong, Jingyi Zheng, Xinlei He, Tianshuo Cong, Anyu Wang

    Abstract: Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful responses for forbidden instructions, presenting severe misuse threats to LLMs. Up to now, research into jailbreak attacks and defenses is emerging, however, there is (surprisingly) no consensus on how to evaluate whether a jailbreak attempt is successful. In other words, the methods to assess the harmfulness of an LL… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Our code is available at https://github.com/ThuCCSLab/JailbreakEval

  17. arXiv:2406.08858  [pdf, other

    cs.RO cs.CV cs.LG eess.SY

    OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

    Authors: Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, Guanya Shi

    Abstract: We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autono… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://omni.human2humanoid.com/

  18. arXiv:2406.08407  [pdf, other

    cs.CV cs.AI cs.CL

    MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    Authors: Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

    Abstract: Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  19. arXiv:2406.08287  [pdf, other

    cs.LG

    Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks

    Authors: Wenying Duan, Tianxiang Fang, Hong Rao, Xiaoxi He

    Abstract: In this paper, we present a novel method to significantly enhance the computational efficiency of Adaptive Spatial-Temporal Graph Neural Networks (ASTGNNs) by introducing the concept of the Graph Winning Ticket (GWT), derived from the Lottery Ticket Hypothesis (LTH). By adopting a pre-determined star topology as a GWT prior to training, we balance edge reduction with efficient information propagat… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Conference paper, accepted by KDD' 24

  20. arXiv:2406.06582  [pdf, ps, other

    cs.CL cs.LG eess.AS

    Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing

    Authors: Viet Anh Trinh, Rosy Southwell, Yiwen Guan, Xinlu He, Zhiyong Wang, Jacob Whitehill

    Abstract: Recent work on discrete speech tokenization has paved the way for models that can seamlessly perform multiple tasks across modalities, e.g., speech recognition, text to speech, speech to speech translation. Moreover, large language models (LLMs) pretrained from vast text corpora contain rich linguistic information that can improve accuracy in a variety of tasks. In this paper, we present a decoder… ▽ More

    Submitted 25 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  21. arXiv:2406.05371  [pdf, other

    cs.NE

    Spiking Neural Networks with Consistent Mapping Relations Allow High-Accuracy Inference

    Authors: Yang Li, Xiang He, Qingqun Kong, Yi Zeng

    Abstract: Spike-based neuromorphic hardware has demonstrated substantial potential in low energy consumption and efficient inference. However, the direct training of deep spiking neural networks is challenging, and conversion-based methods still require substantial time delay owing to unresolved conversion errors. We determine that the primary source of the conversion errors stems from the inconsistency bet… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  22. arXiv:2406.05366  [pdf, other

    cs.LG math.OC

    Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

    Authors: Wenhao Xu, Xuefeng Gao, Xuedong He

    Abstract: Risk-sensitive linear quadratic regulator is one of the most fundamental problems in risk-sensitive optimal control. In this paper, we study online adaptive control of risk-sensitive linear quadratic regulator in the finite horizon episodic setting. We propose a simple least-squares greedy algorithm and show that it achieves $\widetilde{\mathcal{O}}(\log N)$ regret under a specific identifiability… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  23. arXiv:2406.05328  [pdf, other

    cs.CL cs.LG

    Hidden Question Representations Tell Non-Factuality Within and Across Large Language Models

    Authors: Yanling Wang, Haoyang Li, Hao Zou, Jing Zhang, Xinlei He, Qi Li, Ke Xu

    Abstract: Despite the remarkable advance of large language models (LLMs), the prevalence of non-factual responses remains a common issue. This work studies non-factuality prediction (NFP), which predicts whether an LLM will generate non-factual responses to a question before the generation process. Previous efforts on NFP usually rely on extensive computation. In this work, we conduct extensive analysis to… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  24. arXiv:2406.04791  [pdf, other

    cs.SD eess.AS

    Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

    Authors: Shaojun Li, Daimeng Wei, Hengchao Shang, Jiaxin Guo, ZongYao Li, Zhanglin Wu, Zhiqiang Rao, Yuanchang Luo, Xianghui He, Hao Yang

    Abstract: Despite recent improvements in End-to-End Automatic Speech Recognition (E2E ASR) systems, the performance can degrade due to vocal characteristic mismatches between training and testing data, particularly with limited target speaker adaptation data. We propose a novel speaker adaptation approach Speaker-Smoothed kNN that leverages k-Nearest Neighbors (kNN) retrieval techniques to improve model out… ▽ More

    Submitted 1 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  25. arXiv:2406.04127  [pdf, other

    cs.CL cs.AI

    Are We Done with MMLU?

    Authors: Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, Claire Barale, Robert McHardy, Joshua Harris, Jean Kaddour, Emile van Krieken, Pasquale Minervini

    Abstract: Maybe not. We identify and analyse errors in the popular Massive Multitask Language Understanding (MMLU) benchmark. Even though MMLU is widely adopted, our analysis demonstrates numerous ground truth errors that obscure the true capabilities of LLMs. For example, we find that 57% of the analysed questions in the Virology subset contain errors. To address this issue, we introduce a comprehensive fr… ▽ More

    Submitted 7 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  26. arXiv:2406.03368  [pdf, other

    cs.CL cs.AI

    IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

    Authors: David Ifeoluwa Adelani, Jessica Ojo, Israel Abebe Azime, Jian Yun Zhuang, Jesujoba O. Alabi, Xuanli He, Millicent Ochieng, Sara Hooker, Andiswa Bukula, En-Shiun Annie Lee, Chiamaka Chukwuneke, Happy Buzaaba, Blessing Sibanda, Godson Kalipe, Jonathan Mukiibi, Salomon Kabongo, Foutse Yuehgoh, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Shamsuddeen Hassan Muhammad, Salomey Osei, Sokhar Samb, Tadesse Kebede Guge , et al. (1 additional authors not shown)

    Abstract: Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoB… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Under review

  27. arXiv:2406.03215  [pdf, other

    cs.CV

    Searching Priors Makes Text-to-Video Synthesis Better

    Authors: Haoran Cheng, Liang Peng, Linxuan Xia, Yuepeng Hu, Hengjia Li, Qinglin Lu, Xiaofei He, Boxi Wu

    Abstract: Significant advancements in video diffusion models have brought substantial progress to the field of text-to-video (T2V) synthesis. However, existing T2V synthesis model struggle to accurately generate complex motion dynamics, leading to a reduction in video realism. One possible solution is to collect massive data and train the model on it, but this would be extremely expensive. To alleviate this… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  28. arXiv:2406.03210  [pdf, other

    cs.IR

    Text-like Encoding of Collaborative Information in Large Language Models for Recommendation

    Authors: Yang Zhang, Keqin Bao, Ming Yan, Wenjie Wang, Fuli Feng, Xiangnan He

    Abstract: When adapting Large Language Models for Recommendation (LLMRec), it is crucial to integrate collaborative information. Existing methods achieve this by learning collaborative embeddings in LLMs' latent space from scratch or by mapping from external models. However, they fail to represent the information in a text-like format, which may not align optimally with LLMs. To bridge this gap, we introduc… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024

    ACM Class: H.3.3

  29. arXiv:2406.03151  [pdf, other

    cs.CL cs.LG

    Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation

    Authors: Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Tharindu Madusanka, Iqra Zahid, Jiayan Zeng, Xiaochi Wang, Xinran He, Yizhi Li, Goran Nenadic

    Abstract: With the recent advances of large language models (LLMs), it is no longer infeasible to build an automated debate system that helps people to synthesise persuasive arguments. Previous work attempted this task by integrating multiple components. In our work, we introduce an argument mining dataset that captures the end-to-end process of preparing an argumentative essay for a debate, which covers th… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Published on ACL 2024 Findings

  30. arXiv:2406.02638  [pdf, ps, other

    cs.LG cs.AI

    EchoMamba4Rec: Harmonizing Bidirectional State Space Models with Spectral Filtering for Advanced Sequential Recommendation

    Authors: Yuda Wang, Xuxin He, Shengxin Zhu

    Abstract: Predicting user preferences and sequential dependencies based on historical behavior is the core goal of sequential recommendation. Although attention-based models have shown effectiveness in this field, they often struggle with inference inefficiency due to the quadratic computational complexity inherent in attention mechanisms, especially with long-range behavior sequences. Drawing inspiration f… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.03900 by other authors

  31. arXiv:2406.02348  [pdf

    cs.LG

    AMOSL: Adaptive Modality-wise Structure Learning in Multi-view Graph Neural Networks For Enhanced Unified Representation

    Authors: Peiyu Liang, Hongchang Gao, Xubin He

    Abstract: While Multi-view Graph Neural Networks (MVGNNs) excel at leveraging diverse modalities for learning object representation, existing methods assume identical local topology structures across modalities that overlook real-world discrepancies. This leads MVGNNs straggles in modality fusion and representations denoising. To address these issues, we propose adaptive modality-wise structure learning (AM… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Journal ref: 13th International Conference on Soft Computing, Artificial Intelligence and Applications (SAI 2024)

  32. arXiv:2406.01574  [pdf, other

    cs.CL

    MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

    Authors: Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, Wenhu Chen

    Abstract: In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in… ▽ More

    Submitted 23 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  33. arXiv:2406.00335  [pdf, other

    cs.LG

    Benchmarking for Deep Uplift Modeling in Online Marketing

    Authors: Dugang Liu, Xing Tang, Yang Qiao, Miao Liu, Zexu Sun, Xiuqiang He, Zhong Ming

    Abstract: Online marketing is critical for many industrial platforms and business applications, aiming to increase user engagement and platform revenue by identifying corresponding delivery-sensitive groups for specific incentives, such as coupons and bonuses. As the scale and complexity of features in industrial scenarios increase, deep uplift modeling (DUM) as a promising technique has attracted increased… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  34. arXiv:2406.00023  [pdf, other

    cs.CL

    LocMoE+: Enhanced Router with Token Feature Awareness for Efficient LLM Pre-Training

    Authors: Jing Li, Zhijie Sun, Dachao Lin, Xuan He, Yi Lin, Binfan Zheng, Li Zeng, Rongqian Zhao, Xin Chen

    Abstract: Mixture-of-Experts (MoE) architectures have recently gained increasing popularity within the domain of large language models (LLMs) due to their ability to significantly reduce training and inference overhead. However, MoE architectures face challenges, such as significant disparities in the number of tokens assigned to each expert and a tendency toward homogenization among experts, which adversel… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  35. arXiv:2405.20588  [pdf, other

    cs.CL

    DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models

    Authors: Taolin Zhang, Qizhou Chen, Dongyang Li, Chengyu Wang, Xiaofeng He, Longtao Huang, Hui Xue, Jun Huang

    Abstract: Recently, while large language models (LLMs) have demonstrated impressive results, they still suffer from hallucination, i.e., the generation of false information. Model editing is the task of fixing factual mistakes in LLMs; yet, most previous works treat it as a one-time task, paying little attention to ever-emerging mistakes generated by LLMs. We address the task of sequential model editing (SM… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: ACL2024 findings

  36. arXiv:2405.20421  [pdf, other

    cs.AI

    Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA

    Authors: Qianqi Yan, Xuehai He, Xiang Yue, Xin Eric Wang

    Abstract: Large Multimodal Models (LMMs) have shown remarkable progress in medical Visual Question Answering (Med-VQA), achieving high accuracy on existing benchmarks. However, their reliability under robust evaluation is questionable. This study reveals that when subjected to simple probing evaluation, state-of-the-art models perform worse than random guessing on medical diagnosis questions. To address thi… ▽ More

    Submitted 21 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  37. arXiv:2405.18880  [pdf, other

    cs.CV

    EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision

    Authors: Yiting Dong, Xiang He, Guobin Shen, Dongcheng Zhao, Yang Li, Yi Zeng

    Abstract: Event data captured by Dynamic Vision Sensors (DVS) offers a unique approach to visual processing that differs from traditional video capture, showcasing its efficiency in dynamic and real-time scenarios. Despite advantages such as high temporal resolution and low energy consumption, the application of event data faces challenges due to limited dataset size and diversity. To address this, we devel… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  38. arXiv:2405.18860  [pdf, other

    cs.RO

    Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks

    Authors: Tianle Zhang, Dongjiang Li, Yihang Li, Zecui Zeng, Lin Zhao, Lei Sun, Yue Chen, Xuelong Wei, Yibing Zhan, Lusong Li, Xiaodong He

    Abstract: The advancements in embodied AI are increasingly enabling robots to tackle complex real-world tasks, such as household manipulation. However, the deployment of robots in these environments remains constrained by the lack of comprehensive bimanual-mobile robot manipulation data that can be learned. Existing datasets predominantly focus on single-arm manipulation tasks, while the few dual-arm datase… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  39. arXiv:2405.18729  [pdf, other

    cs.LG cs.AI

    Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

    Authors: Tianle Zhang, Jiayi Guan, Lin Zhao, Yihang Li, Dongjiang Li, Zecui Zeng, Lei Sun, Yue Chen, Xuelong Wei, Lusong Li, Xiaodong He

    Abstract: Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for offline RL issues. However, previous offline RL algorithms based on diffusion policies generally adopt weighted regression to improve the policy. This approach opt… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  40. arXiv:2405.18004  [pdf, other

    cs.CV

    SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions

    Authors: Juexiao Zhou, Liyuan Sun, Yan Xu, Wenbin Liu, Shawn Afvari, Zhongyi Han, Jiaoyan Song, Yongzhi Ji, Xiaonan He, Xin Gao

    Abstract: With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision-based large language models (VLLMs), in skin disease diagnosis, the need for interpretability becomes crucial. However, existing dermatology datasets are limited in their inclusion of concept-level meta-labels, and none offer rich medical descriptions in natural language. This deficiency imp… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  41. arXiv:2405.17724  [pdf, ps, other

    cs.AI

    ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models

    Authors: Wei Pang, Masoumeh Shafieinejad, Lucy Liu, Xi He

    Abstract: Recent research in tabular data synthesis has focused on single tables, whereas real-world applications often involve complex data with tens or hundreds of interconnected tables. Previous approaches to synthesizing multi-relational (multi-table) data fall short in two key aspects: scalability for larger datasets and capturing long-range dependencies, such as correlations between attributes spread… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  42. arXiv:2405.16772  [pdf, other

    cs.SI cs.LG

    Balancing User Preferences by Social Networks: A Condition-Guided Social Recommendation Model for Mitigating Popularity Bias

    Authors: Xin He, Wenqi Fan, Ruobing Wang, Yili Wang, Ying Wang, Shirui Pan, Xin Wang

    Abstract: Social recommendation models weave social interactions into their design to provide uniquely personalized recommendation results for users. However, social networks not only amplify the popularity bias in recommendation models, resulting in more frequent recommendation of hot items and fewer long-tail items, but also include a substantial amount of redundant information that is essentially meaning… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 11 pages, 7 figures

  43. arXiv:2405.16265  [pdf, other

    cs.LG

    MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

    Authors: Jikun Kang, Xin Zhe Li, Xi Chen, Amirreza Kazemi, Qianyi Sun, Boxing Chen, Dong Li, Xu He, Quan He, Feng Wen, Jianye Hao, Jun Yao

    Abstract: Although Large Language Models (LLMs) achieve remarkable performance across various tasks, they often struggle with complex reasoning tasks, such as answering mathematical questions. Recent efforts to address this issue have primarily focused on leveraging mathematical datasets through supervised fine-tuning or self-improvement techniques. However, these methods often depend on high-quality datase… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  44. arXiv:2405.16247  [pdf, other

    cs.AI cs.CL

    AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning

    Authors: Minghao Chen, Yihang Li, Yanting Yang, Shiyu Yu, Binbin Lin, Xiaofei He

    Abstract: Large Language Models (LLM) based agents have shown promise in autonomously completing tasks across various domains, e.g., robotics, games, and web navigation. However, these agents typically require elaborate design and expert prompts to solve tasks in specific domains, which limits their adaptability. We introduce AutoManual, a framework enabling LLM agents to autonomously build their understand… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  45. arXiv:2405.15619  [pdf, other

    cs.CV

    DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation

    Authors: Xiankang He, Guangkai Xu, Bo Zhang, Hao Chen, Ying Cui, Dongyan Guo

    Abstract: Monocular camera calibration is a key precondition for numerous 3D vision applications. Despite considerable advancements, existing methods often hinge on specific assumptions and struggle to generalize across varied real-world scenarios, and the performance is limited by insufficient training data. Recently, diffusion models trained on expansive datasets have been confirmed to maintain the capabi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  46. arXiv:2405.15301  [pdf, other

    cs.LG

    Rankability-enhanced Revenue Uplift Modeling Framework for Online Marketing

    Authors: Bowei He, Yunpeng Weng, Xing Tang, Ziqiang Cui, Zexu Sun, Liang Chen, Xiuqiang He, Chen Ma

    Abstract: Uplift modeling has been widely employed in online marketing by predicting the response difference between the treatment and control groups, so as to identify the sensitive individuals toward interventions like coupons or discounts. Compared with traditional \textit{conversion uplift modeling}, \textit{revenue uplift modeling} exhibits higher potential due to its direct connection with the corpora… ▽ More

    Submitted 12 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  47. arXiv:2405.15223  [pdf, other

    cs.CV cs.LG cs.RO

    iVideoGPT: Interactive VideoGPTs are Scalable World Models

    Authors: Jialong Wu, Shaofeng Yin, Ningya Feng, Xu He, Dong Li, Jianye Hao, Mingsheng Long

    Abstract: World models empower model-based agents to interactively explore, reason, and plan within imagined environments for real-world decision-making. However, the high demand for interactivity poses challenges in harnessing recent advancements in video generative models for developing world models at scale. This work introduces Interactive VideoGPT (iVideoGPT), a scalable autoregressive transformer fram… ▽ More

    Submitted 2 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Project website: https://thuml.github.io/iVideoGPT

  48. arXiv:2405.15173  [pdf, other

    cs.CV

    A3:Ambiguous Aberrations Captured via Astray-Learning for Facial Forgery Semantic Sublimation

    Authors: Xinan He, Yue Zhou, Wei Ye, Feng Ding

    Abstract: Prior DeepFake detection methods have faced a core challenge in preserving generalizability and fairness effectively. In this paper, we proposed an approach akin to decoupling and sublimating forgery semantics, named astray-learning. The primary objective of the proposed method is to blend hybrid forgery semantics derived from high-frequency components into authentic imagery, named aberrations. Th… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 19 pages, 9 figures

  49. arXiv:2405.14019  [pdf, other

    cs.CV

    BrainMorph: A Foundational Keypoint Model for Robust and Flexible Brain MRI Registration

    Authors: Alan Q. Wang, Rachit Saluja, Heejong Kim, Xinzi He, Adrian Dalca, Mert R. Sabuncu

    Abstract: We present a keypoint-based foundation model for general purpose brain MRI registration, based on the recently-proposed KeyMorph framework. Our model, called BrainMorph, serves as a tool that supports multi-modal, pairwise, and scalable groupwise registration. BrainMorph is trained on a massive dataset of over 100,000 3D volumes, skull-stripped and non-skull-stripped, from nearly 16,000 unique hea… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2304.09941

  50. arXiv:2405.13873  [pdf, other

    cs.AI cs.CL

    FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering

    Authors: Yuan Sui, Yufei He, Nian Liu, Xiaoxin He, Kun Wang, Bryan Hooi

    Abstract: While large language models (LLMs) have achieved significant success in various applications, they often struggle with hallucinations, especially in scenarios that require deep and responsible reasoning. These issues could be partially mitigate by integrating external knowledge graphs (KG) in LLM reasoning. However, the method of their incorporation is still largely unexplored. In this paper, we p… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.