Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 686 results for author: Deng, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03753  [pdf, other

    cs.CL cs.AI cs.HC cs.IR cs.LG

    WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild

    Authors: Yuntian Deng, Wenting Zhao, Jack Hessel, Xiang Ren, Claire Cardie, Yejin Choi

    Abstract: The increasing availability of real-world conversation data offers exciting opportunities for researchers to study user-chatbot interactions. However, the sheer volume of this data makes manually examining individual conversations impractical. To overcome this challenge, we introduce WildVis, an interactive tool that enables fast, versatile, and large-scale conversation analysis. WildVis provides… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2409.03381  [pdf, other

    cs.CL cs.AI

    CogniDual Framework: Self-Training Large Language Models within a Dual-System Theoretical Framework for Improving Cognitive Tasks

    Authors: Yongxin Deng, Xihe Qiu, Xiaoyu Tan, Chao Qu, Jing Pan, Yuan Cheng, Yinghui Xu, Wei Chu

    Abstract: Cognitive psychology investigates perception, attention, memory, language, problem-solving, decision-making, and reasoning. Kahneman's dual-system theory elucidates the human decision-making process, distinguishing between the rapid, intuitive System 1 and the deliberative, rational System 2. Recent advancements have positioned large language Models (LLMs) as formidable tools nearing human-level p… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.01646  [pdf, other

    cs.RO

    BEVNav: Robot Autonomous Navigation Via Spatial-Temporal Contrastive Learning in Bird's-Eye View

    Authors: Jiahao Jiang, Yuxiang Yang, Yingqi Deng, Chenlong Ma, Jing Zhang

    Abstract: Goal-driven mobile robot navigation in map-less environments requires effective state representations for reliable decision-making. Inspired by the favorable properties of Bird's-Eye View (BEV) in point clouds for visual perception, this paper introduces a novel navigation approach named BEVNav. It employs deep reinforcement learning to learn BEV representations and enhance decision-making reliabi… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  4. arXiv:2409.00565  [pdf, other

    cs.LG cs.CV eess.SP

    Two-Stage Hierarchical and Explainable Feature Selection Framework for Dimensionality Reduction in Sleep Staging

    Authors: Yangfan Deng, Hamad Albidah, Ahmed Dallal, Jijun Yin, Zhi-Hong Mao

    Abstract: Sleep is crucial for human health, and EEG signals play a significant role in sleep research. Due to the high-dimensional nature of EEG signal data sequences, data visualization and clustering of different sleep stages have been challenges. To address these issues, we propose a two-stage hierarchical and explainable feature selection framework by incorporating a feature selection algorithm to impr… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  5. arXiv:2409.00146  [pdf, other

    cs.NI

    Prioritized Information Bottleneck Theoretic Framework with Distributed Online Learning for Edge Video Analytics

    Authors: Zhengru Fang, Senkang Hu, Jingjing Wang, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Collaborative perception systems leverage multiple edge devices, such surveillance cameras or autonomous cars, to enhance sensing quality and eliminate blind spots. Despite their advantages, challenges such as limited channel capacity and data redundancy impede their effectiveness. To address these issues, we introduce the Prioritized Information Bottleneck (PIB) framework for edge video analytics… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: arXiv admin note: text overlap with arXiv:2408.17047

  6. arXiv:2408.17047  [pdf, other

    cs.NI

    PIB: Prioritized Information Bottleneck Framework for Collaborative Edge Video Analytics

    Authors: Zhengru Fang, Senkang Hu, Liyan Yang, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Collaborative edge sensing systems, particularly in collaborative perception systems in autonomous driving, can significantly enhance tracking accuracy and reduce blind spots with multi-view sensing capabilities. However, their limited channel capacity and the redundancy in sensory data pose significant challenges, affecting the performance of collaborative inference tasks. To tackle these issues,… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted by Globecom 2024. Code will be available at https://github.com/fangzr/PIB-Prioritized-Information-Bottleneck-Framework

  7. MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion

    Authors: Yanglin Deng, Tianyang Xu, Chunyang Cheng, Xiao-Jun Wu, Josef Kittler

    Abstract: In recent years, Multi-Modality Image Fusion (MMIF) has been applied to many fields, which has attracted many scholars to endeavour to improve the fusion performance. However, the prevailing focus has predominantly been on the architecture design, rather than the training strategies. As a low-level vision task, image fusion is supposed to quickly deliver output images for observation and supportin… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 10 pages, 8 figures, accpeted by ACM International Conference on Multimedia 2024(Oral)

    Journal ref: 10 pages, 8 figures, accpeted by ACM International Conference on Multimedia 2024(Oral)

  8. arXiv:2408.12505  [pdf, other

    math.OC cs.LG

    Stochastic Compositional Minimax Optimization with Provable Convergence Guarantees

    Authors: Yuyang Deng, Fuli Qiao, Mehrdad Mahdavi

    Abstract: Stochastic compositional minimax problems are prevalent in machine learning, yet there are only limited established on the convergence of this class of problems. In this paper, we propose a formal definition of the stochastic compositional minimax problem, which involves optimizing a minimax loss with a compositional structure either in primal , dual, or both primal and dual variables. We introduc… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  9. arXiv:2408.12366  [pdf, other

    cs.LG cs.CV

    Robust Principal Component Analysis via Discriminant Sample Weight Learning

    Authors: Yingzhuo Deng, Ke Hu, Bo Li, Yao Zhang

    Abstract: Principal component analysis (PCA) is a classical feature extraction method, but it may be adversely affected by outliers, resulting in inaccurate learning of the projection matrix. This paper proposes a robust method to estimate both the data mean and the PCA projection matrix by learning discriminant sample weights from data containing outliers. Each sample in the dataset is assigned a weight, a… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  10. arXiv:2408.10608  [pdf, other

    cs.CL cs.AI

    Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory

    Authors: Yongxin Deng, Xihe Qiu, Xiaoyu Tan, Jing Pan, Chen Jue, Zhijun Fang, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Large language models (LLMs) are trained on extensive text corpora, which inevitably include biased information. Although techniques such as Affective Alignment can mitigate some negative impacts of these biases, existing prompt-based attack methods can still extract these biases from the model's weights. Moreover, these biases frequently appear subtly when LLMs are prompted to perform identical t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  11. arXiv:2408.08645  [pdf, other

    cs.CV

    Extracting polygonal footprints in off-nadir images with Segment Anything Model

    Authors: Kai Li, Jingbo Chen, Yupeng Deng, Yu Meng, Diyou Liu, Junxian Ma, Chenhao Wang

    Abstract: Building Footprint Extraction (BFE) in off-nadir aerial images often relies on roof segmentation and roof-to-footprint offset prediction, then drugging roof-to-footprint via the offset. However, the results from this multi-stage inference are not applicable in data production, because of the low quality of masks given by prediction. To solve this problem, we proposed OBMv2 in this paper, which sup… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  12. arXiv:2408.08160  [pdf, other

    cs.RO cs.AI

    General-purpose Clothes Manipulation with Semantic Keypoints

    Authors: Yuhong Deng, David Hsu

    Abstract: We have seen much recent progress in task-specific clothes manipulation, but generalizable clothes manipulation is still a challenge. Clothes manipulation requires sequential actions, making it challenging to generalize to unseen tasks. Besides, a general clothes state representation method is crucial. In this paper, we adopt language instructions to specify and decompose clothes manipulation task… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  13. arXiv:2408.07719  [pdf, other

    cs.LG cs.AI

    Operator Feature Neural Network for Symbolic Regression

    Authors: Yusong Deng, Min Wu, Lina Yu, Jingyi Liu, Shu Wei, Yanjie Li, Weijun Li

    Abstract: Symbolic regression is a task aimed at identifying patterns in data and representing them through mathematical expressions, generally involving skeleton prediction and constant optimization. Many methods have achieved some success, however they treat variables and symbols merely as characters of natural language without considering their mathematical essence. This paper introduces the operator fea… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 12 pages

  14. arXiv:2408.07685  [pdf, ps, other

    cs.GT

    Auto-bidding and Auctions in Online Advertising: A Survey

    Authors: Gagan Aggarwal, Ashwinkumar Badanidiyuru, Santiago R. Balseiro, Kshipra Bhawalkar, Yuan Deng, Zhe Feng, Gagan Goel, Christopher Liaw, Haihao Lu, Mohammad Mahdian, Jieming Mao, Aranyak Mehta, Vahab Mirrokni, Renato Paes Leme, Andres Perlroth, Georgios Piliouras, Jon Schneider, Ariel Schvartzman, Balasubramanian Sivan, Kelly Spendlove, Yifeng Teng, Di Wang, Hanrui Zhang, Mingfei Zhao, Wennan Zhu , et al. (1 additional authors not shown)

    Abstract: In this survey, we summarize recent developments in research fueled by the growing adoption of automated bidding strategies in online advertising. We explore the challenges and opportunities that have arisen as markets embrace this autobidding and cover a range of topics in this area, including bidding algorithms, equilibrium analysis and efficiency of common auction formats, and optimal auction d… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  15. arXiv:2408.03624  [pdf, other

    cs.CV

    AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging

    Authors: Senkang Hu, Zhengru Fang, Zihan Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang, Sam Kwong

    Abstract: Ramp merging is one of the bottlenecks in traffic systems, which commonly cause traffic congestion, accidents, and severe carbon emissions. In order to address this essential issue and enhance the safety and efficiency of connected and autonomous vehicles (CAVs) at multi-lane merging zones, we propose a novel collaborative decision-making framework, named AgentsCoMerge, to leverage large language… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  16. arXiv:2408.01765  [pdf, other

    cs.LG cs.IT

    Joint Model Pruning and Resource Allocation for Wireless Time-triggered Federated Learning

    Authors: Xinlu Zhang, Yansha Deng, Toktam Mahmoodi

    Abstract: Time-triggered federated learning, in contrast to conventional event-based federated learning, organizes users into tiers based on fixed time intervals. However, this network still faces challenges due to a growing number of devices and limited wireless bandwidth, increasing issues like stragglers and communication overhead. In this paper, we apply model pruning to wireless Time-triggered systems… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Accepted in IEEE Global Communications Conference 2024

  17. arXiv:2408.01688  [pdf, other

    cs.CV

    SiamMo: Siamese Motion-Centric 3D Object Tracking

    Authors: Yuxiang Yang, Yingqi Deng, Jing Zhang, Hongjie Gu, Zhekang Don

    Abstract: Current 3D single object tracking methods primarily rely on the Siamese matching-based paradigm, which struggles with textureless and incomplete LiDAR point clouds. Conversely, the motion-centric paradigm avoids appearance matching, thus overcoming these issues. However, its complex multi-stage pipeline and the limited temporal modeling capability of a single-stream architecture constrain its pote… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  18. arXiv:2407.20799  [pdf, other

    cs.CV

    SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting

    Authors: Yicheng Deng, Hideaki Hayashi, Hajime Nagahara

    Abstract: Facial expression spotting, identifying periods where facial expressions occur in a video, is a significant yet challenging task in facial expression analysis. The issues of irrelevant facial movements and the challenge of detecting subtle motions in micro-expressions remain unresolved, hindering accurate expression spotting. In this paper, we propose an efficient framework for facial expression s… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  19. arXiv:2407.19672  [pdf, other

    cs.CL

    SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

    Authors: Wenxuan Zhang, Hou Pong Chan, Yiran Zhao, Mahani Aljunied, Jianyu Wang, Chaoqun Liu, Yue Deng, Zhiqiang Hu, Weiwen Xu, Yew Ken Chia, Xin Li, Lidong Bing

    Abstract: Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by it… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  20. arXiv:2407.17468  [pdf, other

    cs.CL cs.AI

    WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

    Authors: Wenting Zhao, Tanya Goyal, Yu Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, Khyathi Chandu, Ronan Le Bras, Claire Cardie, Yuntian Deng, Yejin Choi

    Abstract: While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about. To bridge this gap, we introduce WildHallucinations, a benchmark that evaluates factuality. It does so by prompting LLMs to generate information about entities mined fr… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  21. arXiv:2407.15317  [pdf, other

    cs.CV

    Open-CD: A Comprehensive Toolbox for Change Detection

    Authors: Kaiyu Li, Jiawei Jiang, Andrea Codegoni, Chengxi Han, Yupeng Deng, Keyan Chen, Zhuo Zheng, Hao Chen, Zhengxia Zou, Zhenwei Shi, Sheng Fang, Deyu Meng, Zhi Wang, Xiangyong Cao

    Abstract: We present Open-CD, a change detection toolbox that contains a rich set of change detection methods as well as related components and modules. The toolbox started from a series of open source general vision task tools, including OpenMMLab Toolkits, PyTorch Image Models, etc. It gradually evolves into a unified platform that covers many popular change detection methods and contemporary modules. It… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 9 pages

  22. arXiv:2407.14562  [pdf, other

    cs.AI cs.CL

    Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought

    Authors: Xiaoyu Tan, Yongxin Deng, Xihe Qiu, Weidi Xu, Chao Qu, Wei Chu, Yinghui Xu, Yuan Qi

    Abstract: Large language models (LLMs) have shown exceptional performance as general-purpose assistants, excelling across a variety of reasoning tasks. This achievement represents a significant step toward achieving artificial general intelligence (AGI). Despite these advancements, the effectiveness of LLMs often hinges on the specific prompting strategies employed, and there remains a lack of a robust fram… ▽ More

    Submitted 10 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    ACM Class: I.2.7

  23. arXiv:2407.11610  [pdf, other

    cs.CV

    MergeNet: Explicit Mesh Reconstruction from Sparse Point Clouds via Edge Prediction

    Authors: Weimin Wang, Yingxu Deng, Zezeng Li, Yu Liu, Na Lei

    Abstract: This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection. Existing implicit methods usually produce superior smooth and watertight meshes due to the isosurface extraction algorithms~(e.g., Marching Cubes). However, these methods become memory and computationally intensive with increasing resolution. Explicit methods are more efficient by… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  24. arXiv:2407.11484  [pdf, other

    cs.AI cs.CL

    The Oscars of AI Theater: A Survey on Role-Playing with Language Models

    Authors: Nuo Chen, Yan Wang, Yang Deng, Jia Li

    Abstract: This survey explores the burgeoning field of role-playing with language models, focusing on their development from early persona-based models to advanced character-driven simulations facilitated by Large Language Models (LLMs). Initially confined to simple persona consistency due to limited model capabilities, role-playing tasks have now expanded to embrace complex character portrayals involving c… ▽ More

    Submitted 1 September, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 28 pages

  25. arXiv:2407.09003  [pdf, other

    cs.AI

    Enhancing Few-Shot Stock Trend Prediction with Large Language Models

    Authors: Yiqi Deng, Xingwei He, Jiahao Hu, Siu-Ming Yiu

    Abstract: The goal of stock trend prediction is to forecast future market movements for informed investment decisions. Existing methods mostly focus on predicting stock trends with supervised models trained on extensive annotated data. However, human annotation can be resource-intensive and the annotated data are not readily available. Inspired by the impressive few-shot capability of Large Language Models… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  26. arXiv:2407.06617  [pdf, other

    cs.CV

    Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task

    Authors: Yiran Yang, Jinchao Zhang, Ying Deng, Jie Zhou

    Abstract: Inspired by the success of the text-to-image (T2I) generation task, many researchers are devoting themselves to the text-to-video (T2V) generation task. Most of the T2V frameworks usually inherit from the T2I model and add extra-temporal layers of training to generate dynamic videos, which can be viewed as a fine-tuning task. However, the traditional 3D-Unet is a serial mode and the temporal layer… ▽ More

    Submitted 23 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  27. arXiv:2407.02252  [pdf, other

    cs.CV

    GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

    Authors: Jian Ma, Yonglin Deng, Chen Chen, Haonan Lu, Zhenyu Yang

    Abstract: Posters play a crucial role in marketing and advertising by enhancing visual communication and brand visibility, making significant contributions to industrial design. With the latest advancements in controllable T2I diffusion models, increasing research has focused on rendering text within synthesized images. Despite improvements in text rendering accuracy, the field of automatic poster generatio… ▽ More

    Submitted 30 August, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  28. arXiv:2407.01489  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    Agentless: Demystifying LLM-based Software Engineering Agents

    Authors: Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, Lingming Zhang

    Abstract: Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents are equipped with the ability to use tools, run c… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  29. arXiv:2407.01231  [pdf, other

    cs.CL cs.AI

    MIRAI: Evaluating LLM Agents for Event Forecasting

    Authors: Chenchen Ye, Ziniu Hu, Yihe Deng, Zijie Huang, Mingyu Derek Ma, Yanqiao Zhu, Wei Wang

    Abstract: Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 66 pages, 8 figures, 6 tables; Website: https://mirai-llm.github.io/

  30. arXiv:2406.14844  [pdf, other

    cs.LG cs.AI

    DN-CL: Deep Symbolic Regression against Noise via Contrastive Learning

    Authors: Jingyi Liu, Yanjie Li, Lina Yu, Min Wu, Weijun Li, Wenqiang Li, Meilan Hao, Yusong Deng, Shu Wei

    Abstract: Noise ubiquitously exists in signals due to numerous factors including physical, electronic, and environmental effects. Traditional methods of symbolic regression, such as genetic programming or deep learning models, aim to find the most fitting expressions for these signals. However, these methods often overlook the noise present in real-world data, leading to reduced fitting accuracy. To tackle… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  31. arXiv:2406.14283  [pdf, other

    cs.AI

    Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

    Authors: Chaojie Wang, Yanchen Deng, Zhiyi Lyu, Liang Zeng, Jujie He, Shuicheng Yan, Bo An

    Abstract: Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. In this paper, by casting multi-step reasoning of LLMs as a heuristic search problem, we aim to alleviate the pathology by introducing… ▽ More

    Submitted 22 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  32. arXiv:2406.13963  [pdf, ps, other

    cs.CV

    SSAD: Self-supervised Auxiliary Detection Framework for Panoramic X-ray based Dental Disease Diagnosis

    Authors: Zijian Cai, Xinquan Yang, Xuguang Li, Xiaoling Luo, Xuechen Li, Linlin Shen, He Meng, Yongqiang Deng

    Abstract: Panoramic X-ray is a simple and effective tool for diagnosing dental diseases in clinical practice. When deep learning models are developed to assist dentist in interpreting panoramic X-rays, most of their performance suffers from the limited annotated data, which requires dentist's expertise and a lot of time cost. Although self-supervised learning (SSL) has been proposed to address this challeng… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  33. arXiv:2406.12639  [pdf, other

    cs.CL cs.AI

    Ask-before-Plan: Proactive Language Agents for Real-World Planning

    Authors: Xuan Zhang, Yang Deng, Zifeng Ren, See-Kiong Ng, Tat-Seng Chua

    Abstract: The evolution of large language models (LLMs) has enhanced the planning capabilities of language agents in diverse real-world scenarios. Despite these advancements, the potential of LLM-powered agents to comprehend ambiguous user instructions for reasoning and decision-making is still under exploration. In this work, we introduce a new task, Proactive Agent Planning, which requires language agents… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  34. arXiv:2406.12355  [pdf, other

    cs.CV

    LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition

    Authors: Yunze Deng, Haijun Xiong, Bin Feng

    Abstract: Gait recognition is a biometric technology that identifies individuals by using walking patterns. Due to the significant achievements of multimodal fusion in gait recognition, we consider employing LiDAR-camera fusion to obtain robust gait representations. However, existing methods often overlook intrinsic characteristics of modalities, and lack fine-grained fusion and temporal modeling. In this p… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by ICIP2024

  35. arXiv:2406.11364  [pdf, other

    cs.SD eess.AS

    AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

    Authors: Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan

    Abstract: Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, res… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  36. arXiv:2406.11066  [pdf, other

    cs.CV

    Parameter Blending for Multi-Camera Harmonization for Automotive Surround View Systems

    Authors: Yuzhuo Ren, Yining Deng, David Pajak, Robin Jenkin, Niranjan Avadhanam, Varsha Hedau

    Abstract: In a surround view system, the image color and tone captured by multiple cameras can be different due to cameras applying auto white balance (AWB), global tone mapping (GTM) individually for each camera. The color and brightness along stitched seam location may look discontinuous among multiple cameras which impacts overall stitched image visual quality. To improve the color transition between adj… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  37. arXiv:2406.09410  [pdf, other

    cs.CV cs.AI

    STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery

    Authors: Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

    Abstract: Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 18 pages, 11 figures

  38. arXiv:2406.08464  [pdf, other

    cs.CL cs.AI

    Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

    Authors: Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, Bill Yuchen Lin

    Abstract: High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Link: https://magpie-align.github.io/

  39. arXiv:2406.08184  [pdf, other

    cs.AI cs.HC

    MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents

    Authors: Luyuan Wang, Yongyu Deng, Yiwei Zha, Guodong Mao, Qinmin Wang, Tianchen Min, Wei Chen, Shoufa Chen

    Abstract: Large language model (LLM)-based mobile agents are increasingly popular due to their capability to interact directly with mobile phone Graphic User Interfaces (GUIs) and their potential to autonomously manage daily tasks. Despite their promising prospects in both academic and industrial sectors, little research has focused on benchmarking the performance of existing mobile agents, due to the inexh… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  40. arXiv:2406.08009  [pdf, other

    cs.CV cs.AI cs.RO

    OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

    Authors: Yinan Deng, Jiahui Wang, Jingyu Zhao, Jianyu Dou, Yi Yang, Yufeng Yue

    Abstract: In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval. However, existing methods face some limitations: they either focus on learning point-wise features, resulting in blurry semantic understanding, or solely tackle object-level reconstruction, thereby… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 8 pages, 7figures. Project Url: https://openobj.github.io/

  41. arXiv:2406.06565  [pdf, other

    cs.CL cs.AI cs.LG

    MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

    Authors: Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, Yang You

    Abstract: Evaluating large language models (LLMs) is challenging. Traditional ground-truth-based benchmarks fail to capture the comprehensiveness and nuance of real-world queries, while LLM-as-judge benchmarks suffer from grading biases and limited query quantity. Both of them may also become contaminated over time. User-facing evaluation, such as Chatbot Arena, provides reliable signals but is costly and s… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  42. arXiv:2406.05925  [pdf, other

    cs.CL cs.AI

    Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

    Authors: Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, Tat-Seng Chua

    Abstract: Open-domain dialogue systems have seen remarkable advancements with the development of large language models (LLMs). Nonetheless, most existing dialogue systems predominantly focus on brief single-session interactions, neglecting the real-world demands for long-term companionship and personalized interactions with chatbots. Crucial to addressing this real-world need are event summary and persona m… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 17 pages, 4 figures

  43. arXiv:2406.05410  [pdf, other

    cs.AI cs.CL

    MLLM-SR: Conversational Symbolic Regression base Multi-Modal Large Language Models

    Authors: Yanjie Li, Weijun Li, Lina Yu, Min Wu, Jingyi Liu, Wenqiang Li, Shu Wei, Yusong Deng

    Abstract: Formulas are the language of communication between humans and nature. It is an important research topic of artificial intelligence to find expressions from observed data to reflect the relationship between each variable in the data, which is called a symbolic regression problem. The existing symbolic regression methods directly generate expressions according to the given observation data, and we c… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 13 pages,

  44. arXiv:2406.04770  [pdf, other

    cs.CL cs.AI

    WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

    Authors: Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Faeze Brahman, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Yejin Choi

    Abstract: We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. WildBench consists of 1,024 tasks carefully selected from over one million human-chatbot conversation logs. For automated evaluation with WildBench, we have developed two metrics, WB-Reward and WB-Score, which are computable using advanced LLMs su… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Link: https://hf.co/spaces/allenai/WildBench

  45. arXiv:2406.04603  [pdf, ps, other

    cs.CV

    Simplify Implant Depth Prediction as Video Grounding: A Texture Perceive Implant Depth Prediction Network

    Authors: Xinquan Yang, Xuguang Li, Xiaoling Luo, Leilei Zeng, Yudi Zhang, Linlin Shen, Yongqiang Deng

    Abstract: Surgical guide plate is an important tool for the dental implant surgery. However, the design process heavily relies on the dentist to manually simulate the implant angle and depth. When deep neural networks have been applied to assist the dentist quickly locates the implant position, most of them are not able to determine the implant depth. Inspired by the video grounding task which localizes the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Journal ref: MICCAI'2024

  46. arXiv:2406.04277  [pdf, other

    cs.CV

    VideoTetris: Towards Compositional Text-to-Video Generation

    Authors: Ye Tian, Ling Yang, Haotian Yang, Yuan Gao, Yufan Deng, Jingmin Chen, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, Di Zhang, Bin Cui

    Abstract: Diffusion models have demonstrated great success in text-to-video (T2V) generation. However, existing methods may face challenges when handling complex (long) video generation scenarios that involve multiple objects or dynamic changes in object numbers. To address these limitations, we propose VideoTetris, a novel framework that enables compositional T2V generation. Specifically, we propose spatio… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/YangLing0818/VideoTetris

  47. arXiv:2406.03714  [pdf, other

    cs.SD eess.AS

    Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

    Authors: Jinlong Xue, Yayue Deng, Yingming Gao, Ya Li

    Abstract: Recent prompt-based text-to-speech (TTS) models can clone an unseen speaker using only a short speech prompt. They leverage a strong in-context ability to mimic the speech prompts, including speaker style, prosody, and emotion. Therefore, the selection of a speech prompt greatly influences the generated speech, akin to the importance of a prompt in large language models (LLMs). However, current pr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  48. arXiv:2406.03706  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

    Authors: Jinlong Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li

    Abstract: Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we intr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  49. arXiv:2406.00800  [pdf, other

    cs.LG cs.AI

    MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization

    Authors: Aozhong Zhang, Naigang Wang, Yanxia Deng, Xin Li, Zi Yang, Penghang Yin

    Abstract: In this paper, we present a simple optimization-based preprocessing technique called Weight Magnitude Reduction (MagR) to improve the performance of post-training quantization. For each linear layer, we adjust the pre-trained floating-point weights by solving an $\ell_\infty$-regularized optimization problem. This process greatly diminishes the maximum magnitude of the weights and smooths out outl… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  50. arXiv:2405.19716  [pdf, other

    cs.CV cs.CL

    Enhancing Large Vision Language Models with Self-Training on Image Comprehension

    Authors: Yihe Deng, Pan Lu, Fan Yin, Ziniu Hu, Sheng Shen, James Zou, Kai-Wei Chang, Wei Wang

    Abstract: Large vision language models (LVLMs) integrate large language models (LLMs) with pre-trained vision encoders, thereby activating the perception capability of the model to understand image inputs for different queries and conduct subsequent reasoning. Improving this capability requires high-quality vision-language data, which is costly and labor-intensive to acquire. Self-training approaches have b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures, 6 tables