Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 317 results for author: Kong, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.16347  [pdf, other

    cs.CL

    FACTTRACK: Time-Aware World State Tracking in Story Outlines

    Authors: Zhiheng Lyu, Kevin Yang, Lingpeng Kong, Daniel Klein

    Abstract: While accurately detecting and correcting factual contradictions in language model outputs has become increasingly important as their capabilities improve, doing so is highly challenging. We propose a novel method, FACTTRACK, for tracking atomic facts and addressing factual contradictions. Crucially, FACTTRACK also maintains time-aware validity intervals for each fact, allowing for change over tim… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 22 pages

  2. arXiv:2407.16308  [pdf, other

    cs.CV eess.IV

    SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging

    Authors: Lingtong Kong, Bo Li, Yike Xiong, Hao Zhang, Hong Gu, Jinwei Chen

    Abstract: Multi-exposure High Dynamic Range (HDR) imaging is a challenging task when facing truncated texture and complex motion. Existing deep learning-based methods have achieved great success by either following the alignment and fusion pipeline or utilizing attention mechanism. However, the large computation cost and inference delay hinder them from deploying on resource limited devices. In this paper,… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  3. arXiv:2407.15282  [pdf, other

    cs.CV

    Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation

    Authors: Xiaoyang Wu, Xiang Xu, Lingdong Kong, Liang Pan, Ziwei Liu, Tong He, Wanli Ouyang, Hengshuang Zhao

    Abstract: In this technical report, we detail our first-place solution for the 2024 Waymo Open Dataset Challenge's semantic segmentation track. We significantly enhanced the performance of Point Transformer V3 on the Waymo benchmark by implementing cutting-edge, plug-and-play training and inference technologies. Notably, our advanced version, Point Transformer V3 Extreme, leverages multi-frame training and… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation

  4. arXiv:2407.14126  [pdf, other

    cs.CV

    Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation

    Authors: Jinfeng Liu, Lingtong Kong, Bo Li, Zerong Wang, Hong Gu, Jinwei Chen

    Abstract: Self-supervised monocular depth estimation has gathered notable interest since it can liberate training from dependency on depth annotations. In monocular video training case, recent methods only conduct view synthesis between existing camera views, leading to insufficient guidance. To tackle this, we try to synthesize more virtual camera views by flow-based video frame interpolation (VFI), termed… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 27 pages, accepted by ECCV 2024

  5. arXiv:2407.09709  [pdf, other

    cs.LG cs.CL

    GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

    Authors: Lecheng Kong, Jiarui Feng, Hao Liu, Chengsong Huang, Jiaxin Huang, Yixin Chen, Muhan Zhang

    Abstract: Foundation models, such as Large Language Models (LLMs) or Large Vision Models (LVMs), have emerged as one of the most powerful tools in the respective fields. However, unlike text and image data, graph data do not have a definitive structure, posing great challenges to developing a Graph Foundation Model (GFM). For example, current attempts at designing general graph models either transform graph… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  6. arXiv:2407.06190  [pdf, other

    cs.CV cs.LG cs.RO

    4D Contrastive Superflows are Dense 3D Representation Learners

    Authors: Xiang Xu, Lingdong Kong, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu

    Abstract: In the realm of autonomous driving, accurate 3D perception is the foundation. However, developing such models relies on extensive human annotations -- a process that is both costly and labor-intensive. To address this challenge from a data representation learning perspective, we introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing spatiotempora… ▽ More

    Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; 36 pages, 11 figures, 11 tables; Code at https://github.com/Xiangxu-0103/SuperFlow

  7. arXiv:2407.02641  [pdf, other

    cs.LG cs.AI

    Learning Graph Structures and Uncertainty for Accurate and Calibrated Time-series Forecasting

    Authors: Harshavardhan Kamarthi, Lingkai Kong, Alexander Rodriguez, Chao Zhang, B Aditya Prakash

    Abstract: Multi-variate time series forecasting is an important problem with a wide range of applications. Recent works model the relations between time-series as graphs and have shown that propagating information over the relation graph can improve time series forecasting. However, in many cases, relational information is not available or is noisy and reliable. Moreover, most works ignore the underlying un… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  8. arXiv:2406.18201  [pdf, other

    eess.IV cs.CV

    EFCNet: Every Feature Counts for Small Medical Object Segmentation

    Authors: Lingjie Kong, Qiaoling Wei, Chengming Xu, Han Chen, Yanwei Fu

    Abstract: This paper explores the segmentation of very small medical objects with significant clinical value. While Convolutional Neural Networks (CNNs), particularly UNet-like models, and recent Transformers have shown substantial progress in image segmentation, our empirical findings reveal their poor performance in segmenting the small medical objects and lesions concerned in this paper. This limitation… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  9. arXiv:2406.16976  [pdf, other

    cs.NE cs.AI cs.LG physics.chem-ph

    Efficient Evolutionary Search Over Chemical Space with Large Language Models

    Authors: Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Strieth-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang

    Abstract: Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations… ▽ More

    Submitted 2 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  10. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  11. arXiv:2406.16708  [pdf, other

    cs.LG stat.ME

    CausalFormer: An Interpretable Transformer for Temporal Causal Discovery

    Authors: Lingbai Kong, Wengen Li, Hanchen Yang, Yichao Zhang, Jihong Guan, Shuigeng Zhou

    Abstract: Temporal causal discovery is a crucial task aimed at uncovering the causal relations within time series data. The latest temporal causal discovery methods usually train deep learning models on prediction tasks to uncover the causality between time series. They capture causal relations by analyzing the parameters of some components of the trained models, e.g., attention weights and convolution weig… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  12. arXiv:2406.16567  [pdf, other

    cs.CL

    Data Augmentation of Multi-turn Psychological Dialogue via Knowledge-driven Progressive Thought Prompting

    Authors: Jiyue Jiang, Liheng Chen, Sheng Wang, Lingpeng Kong, Yu Li, Chuan Wu

    Abstract: Existing dialogue data augmentation (DA) techniques predominantly focus on augmenting utterance-level dialogues, which makes it difficult to take dialogue contextual information into account. The advent of large language models (LLMs) has simplified the implementation of multi-turn dialogues. Due to absence of professional understanding and knowledge, it remains challenging to deliver satisfactory… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  13. arXiv:2406.16500  [pdf, other

    cs.NE

    A Dual-Channel Particle Swarm Optimization Algorithm Based on Adaptive Balance Search

    Authors: Zhenxing Zhang, Tianxian Zhang, Xiangliang Xu, Lingjiang Kong, Yi Han, Zicheng Wang

    Abstract: The balance between exploration (Er) and exploitation (Ei) determines the generalization performance of the particle swarm optimization (PSO) algorithm on different problems. Although the insufficient balance caused by global best being located near a local minimum has been widely researched, few scholars have systematically paid attention to two behaviors about personal best position (P) and glob… ▽ More

    Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  14. arXiv:2406.14683  [pdf, other

    cs.LG cs.CL

    TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models

    Authors: Jiarui Feng, Hao Liu, Lecheng Kong, Yixin Chen, Muhan Zhang

    Abstract: In this report, we present TAGLAS, an atlas of text-attributed graph (TAG) datasets and benchmarks. TAGs are graphs with node and edge features represented in text, which have recently gained wide applicability in training graph-language or graph foundation models. In TAGLAS, we collect and integrate more than 23 TAG datasets with domains ranging from citation graphs to molecule graphs and tasks f… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  15. arXiv:2406.14393  [pdf, other

    cs.LG cs.CL

    Jailbreaking as a Reward Misspecification Problem

    Authors: Zhihui Xie, Jiahui Gao, Lei Li, Zhenguo Li, Qi Liu, Lingpeng Kong

    Abstract: The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and d… ▽ More

    Submitted 12 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: github url added

  16. arXiv:2406.12214  [pdf, other

    cs.RO cs.CV

    Is Your HD Map Constructor Reliable under Sensor Corruptions?

    Authors: Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, Hui Zhang, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong, Jing Zhang

    Abstract: Driving systems often rely on high-definition (HD) maps for precise environmental information, which is crucial for planning and navigation. While current HD map constructors perform well under ideal conditions, their resilience to real-world challenges, \eg, adverse weather and sensor failures, is not well understood, raising safety concerns. This work introduces MapBench, the first comprehensive… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: project url: https://mapbench.github.io/

  17. arXiv:2406.11643  [pdf, other

    cs.CV

    AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection

    Authors: Lingjie Kong, Kai Wu, Xiaobin Hu, Wenhui Han, Jinlong Peng, Chengming Xu, Donghao Luo, Jiangning Zhang, Chengjie Wang, Yanwei Fu

    Abstract: Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is dominated by specialized tasks, such as human customization or virtual try-on, leaving a gap in general object customization. To this end, we introduce AnyM… ▽ More

    Submitted 5 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  18. arXiv:2406.09130  [pdf, other

    cs.LG cs.AI

    Time-Series Forecasting for Out-of-Distribution Generalization Using Invariant Learning

    Authors: Haoxin Liu, Harshavardhan Kamarthi, Lingkai Kong, Zhiyuan Zhao, Chao Zhang, B. Aditya Prakash

    Abstract: Time-series forecasting (TSF) finds broad applications in real-world scenarios. Due to the dynamic nature of time-series data, it is crucial to equip TSF models with out-of-distribution (OOD) generalization abilities, as historical training data and future test data can have different distributions. In this paper, we aim to alleviate the inherent OOD problem in TSF via invariant learning. We ident… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 14 pages

    ACM Class: H.0

  19. arXiv:2406.08627  [pdf, other

    cs.LG cs.CL

    Time-MMD: A New Multi-Domain Multimodal Dataset for Time Series Analysis

    Authors: Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Kamarthi, Aditya B. Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, B. Aditya Prakash

    Abstract: Time series data are ubiquitous across a wide range of real-world domains. While real-world time series analysis (TSA) requires human experts to integrate numerical series data with multimodal domain-specific knowledge, most existing TSA models rely solely on numerical data, overlooking the significance of information beyond numerical series. This oversight is due to the untapped potential of text… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  20. arXiv:2406.06649  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

    Authors: Kai Liu, Haotong Qin, Yong Guo, Xin Yuan, Linghe Kong, Guihai Chen, Yulun Zhang

    Abstract: Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their ful… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures. The code and models will be available at https://github.com/Kai-Liu001/2DQuant

  21. arXiv:2406.05954  [pdf, other

    cs.AI cs.LG eess.SY

    Aligning Large Language Models with Representation Editing: A Control Perspective

    Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

    Abstract: Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabi… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: fix typos

  22. arXiv:2406.05723  [pdf, other

    cs.CV

    Binarized Diffusion Model for Image Super-Resolution

    Authors: Zheng Chen, Haotong Qin, Yong Guo, Xiongfei Su, Xin Yuan, Linghe Kong, Yulun Zhang

    Abstract: Advanced diffusion models (DMs) perform impressively in image super-resolution (SR), but the high memory and computational costs hinder their deployment. Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating DMs. Nonetheless, due to the model structure and the multi-step iterative attribute of DMs, existing binarization methods result in significant perfor… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Code is available at https://github.com/zhengchen1999/BI-DiffSR

  23. arXiv:2406.05316  [pdf, other

    cs.LG

    C-Mamba: Channel Correlation Enhanced State Space Models for Multivariate Time Series Forecasting

    Authors: Chaolv Zeng, Zhanyu Liu, Guanjie Zheng, Linghe Kong

    Abstract: In recent years, significant progress has been made in multivariate time series forecasting using Linear-based, Transformer-based, and Convolution-based models. However, these approaches face notable limitations: linear forecasters struggle with representation capacities, attention mechanisms suffer from quadratic complexity, and convolutional models have a restricted receptive field. These constr… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  24. arXiv:2406.04744  [pdf, other

    cs.CL

    CRAG -- Comprehensive RAG Benchmark

    Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

    Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  25. arXiv:2406.02131  [pdf, other

    cs.LG cs.AI

    CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

    Authors: Jianrong Ding, Zhanyu Liu, Guanjie Zheng, Haiming Jin, Linghe Kong

    Abstract: Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing c… ▽ More

    Submitted 11 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 23 pages, 13 figures

  26. arXiv:2406.01876  [pdf, other

    cs.DB cs.AI cs.CL cs.IR cs.LG

    GRAM: Generative Retrieval Augmented Matching of Data Schemas in the Context of Data Security

    Authors: Xuanqing Liu, Luyang Kong, Runhui Wang, Patrick Song, Austin Nevins, Henrik Johnson, Nimish Amlathe, Davor Golac

    Abstract: Schema matching constitutes a pivotal phase in the data ingestion process for contemporary database systems. Its objective is to discern pairwise similarities between two sets of attributes, each associated with a distinct data table. This challenge emerges at the initial stages of data analytics, such as when incorporating a third-party table into existing databases to inform business insights. G… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: KDD 2024 Camera Ready; 11 pages, 8 figures

  27. arXiv:2406.00519  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Discrete Concepts in Latent Hierarchical Models

    Authors: Lingjing Kong, Guangyi Chen, Biwei Huang, Eric P. Xing, Yuejie Chi, Kun Zhang

    Abstract: Learning concepts from natural high-dimensional data (e.g., images) holds potential in building human-aligned and interpretable machine learning models. Despite its encouraging prospect, formalization and theoretical insights into this crucial task are still lacking. In this work, we formalize concepts as discrete latent causal variables that are related via a hierarchical causal model that encode… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  28. arXiv:2405.20390  [pdf, other

    cs.LG math.NA math.OC stat.ML

    Quantitative Convergences of Lie Group Momentum Optimizers

    Authors: Lingkai Kong, Molei Tao

    Abstract: Explicit, momentum-based dynamics that optimize functions defined on Lie groups can be constructed via variational optimization and momentum trivialization. Structure preserving time discretizations can then turn this dynamics into optimization algorithms. This article investigates two types of discretization, Lie Heavy-Ball, which is a known splitting scheme, and Lie NAG-SC, which is newly propos… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  29. arXiv:2405.17741  [pdf, other

    cs.AI

    LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design

    Authors: Rui Kong, Qiyang Li, Xinyu Fang, Qingtian Feng, Qingfeng He, Yazhu Dong, Weijun Wang, Yuanchun Li, Linghe Kong, Yunxin Liu

    Abstract: Recent literature has found that an effective method to customize or further improve large language models (LLMs) is to add dynamic adapters, such as low-rank adapters (LoRA) with Mixture-of-Experts (MoE) structures. Though such dynamic adapters incur modest computational complexity, they surprisingly lead to huge inference latency overhead, slowing down the decoding speed by 2.5+ times. In this p… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  30. arXiv:2405.17426  [pdf, other

    cs.CV cs.RO

    Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving

    Authors: Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu

    Abstract: Recent advancements in bird's eye view (BEV) representations have shown remarkable promise for in-vehicle 3D perception. However, while these methods have achieved impressive results on standard benchmarks, their robustness in varied conditions remains insufficiently assessed. In this study, we present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms. Thi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 13 figures, 11 tables; Code at this https URL: https://github.com/Daniel-xsy/RoboBEV

  31. arXiv:2405.16381  [pdf, other

    cs.LG cs.AI stat.ML

    Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

    Authors: Yuchen Zhu, Tianrong Chen, Lingkai Kong, Evangelos A. Theodorou, Molei Tao

    Abstract: The generative modeling of data on manifold is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the po… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  32. arXiv:2405.15756  [pdf, other

    cs.LG cs.AI

    Sparse Expansion and Neuronal Disentanglement

    Authors: Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

    Abstract: We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the original weights, one-shot pruned for a specific cluster of input values. We call this approach $\textit{Sparse Expansion}$. We show that, for models such as Llama 2 70B, as we increase the number of sparse experts, Sparse Expansion outperforms all other on… ▽ More

    Submitted 24 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures

  33. arXiv:2405.14870  [pdf, other

    cs.CV cs.RO

    An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

    Authors: Jiahao Sun, Chunmei Qing, Xiang Xu, Lingdong Kong, Youquan Liu, Li Li, Chenming Zhu, Jingwei Zhang, Zeqi Xiao, Runnan Chen, Tai Wang, Wenwei Zhang, Kai Chen

    Abstract: In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments. Traditional approaches often rely on disparate, standalone codebases, hindering unified advancements and fair benchmarking across models. To address these challenges, we introduce MMDetection3D-lidarseg, a comprehensive toolbox designed for the efficient tra… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 4 figures, 7 tables; Code at https://github.com/open-mmlab/mmdetection3d

  34. arXiv:2405.14343  [pdf, other

    cs.CV

    Efficient Visual State Space Model for Image Deblurring

    Authors: Lingshun Kong, Jiangxin Dong, Ming-Hsuan Yang, Jinshan Pan

    Abstract: Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration. ViTs typically yield superior results in image restoration compared to CNNs due to their ability to capture long-range dependencies and input-dependent characteristics. However, the computational complexity of Transformer-based models grows quadratically with the image reso… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  35. arXiv:2405.14295  [pdf, other

    cs.CV

    Focus Anywhere for Fine-grained Multi-page Document Understanding

    Authors: Chenglong Liu, Haoran Wei, Jinyue Chen, Lingyu Kong, Zheng Ge, Zining Zhu, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Modern LVLMs still struggle to achieve fine-grained document understanding, such as OCR/translation/caption for regions of interest to the user, tasks that require the context of the entire page, or even multiple pages. Accordingly, this paper proposes Fox, an effective pipeline, hybrid data, and tuning strategy, that catalyzes LVLMs to focus anywhere on single/multi-page documents. We introduce a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  36. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  37. arXiv:2405.05259  [pdf, other

    cs.CV cs.RO

    OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

    Authors: Lingdong Kong, Youquan Liu, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi

    Abstract: Event-based semantic segmentation (ESS) is a fundamental yet challenging task for event camera sensing. The difficulties in interpreting and annotating event data limit its scalability. While domain adaptation from images to event data can help to mitigate this issue, there exist data representational differences that require additional effort to resolve. In this work, for the first time, we syner… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: CVPR 2024 (Highlight); 26 pages, 12 figures, 11 tables; Code at https://github.com/ldkong1205/OpenESS

  38. arXiv:2405.05258  [pdf, other

    cs.CV cs.LG cs.RO

    Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

    Authors: Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu

    Abstract: Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the effi… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 6 figures, 8 tables; Code at https://github.com/ldkong1205/LaserMix

  39. arXiv:2405.04902  [pdf, other

    eess.IV cs.CV

    HAGAN: Hybrid Augmented Generative Adversarial Network for Medical Image Synthesis

    Authors: Zhihan Ju, Wanting Zhou, Longteng Kong, Yu Chen, Yi Li, Zhenan Sun, Caifeng Shan

    Abstract: Medical Image Synthesis (MIS) plays an important role in the intelligent medical field, which greatly saves the economic and time costs of medical diagnosis. However, due to the complexity of medical images and similar characteristics of different tissue cells, existing methods face great challenges in meeting their biological consistency. To this end, we propose the Hybrid Augmented Generative Ad… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  40. arXiv:2405.01538  [pdf, other

    cs.CV cs.LG cs.RO

    Multi-Space Alignments Towards Universal LiDAR Segmentation

    Authors: Youquan Liu, Lingdong Kong, Xiaoyang Wu, Runnan Chen, Xin Li, Liang Pan, Ziwei Liu, Yuexin Ma

    Abstract: A unified and versatile LiDAR segmentation model with strong robustness and generalizability is desirable for safe autonomous driving perception. This work presents M3Net, a one-of-a-kind framework for fulfilling multi-task, multi-dataset, multi-modality LiDAR segmentation in a universal manner using just a single set of parameters. To better exploit data volume and diversity, we first combine lar… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: CVPR 2024; 33 pages, 14 figures, 14 tables; Code at https://github.com/youquanl/M3Net

  41. arXiv:2404.14542  [pdf, other

    cs.CV

    UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

    Authors: Yaofeng Xie, Lingwei Kong, Kai Chen, Ziqiang Zheng, Xiao Yu, Zhibin Yu, Bing Zheng

    Abstract: Learning-based underwater image enhancement (UIE) methods have made great progress. However, the lack of large-scale and high-quality paired training samples has become the main bottleneck hindering the development of UIE. The inter-frame information in underwater videos can accelerate or optimize the UIE process. Thus, we constructed the first large-scale high-resolution underwater video enhancem… ▽ More

    Submitted 27 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages,CVPR2024 accept

    ACM Class: I.4

  42. arXiv:2404.09987  [pdf, other

    cs.CV

    OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

    Authors: Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Chart parsing poses a significant challenge due to the diversity of styles, values, texts, and so forth. Even advanced large vision-language models (LVLMs) with billions of parameters struggle to handle such tasks satisfactorily. To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information. Similar to popular LVLMs, OneChart incorpo… ▽ More

    Submitted 25 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 14 pages, 9 figures and 6 tables

  43. arXiv:2403.17010  [pdf, other

    cs.CV cs.LG cs.RO

    Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

    Authors: Lingdong Kong, Xiang Xu, Jun Cen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

    Abstract: Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models. This study introduces Calib3D, a pioneering effort to benchmark and scrutinize the reliability of 3D scene understanding models from an uncertainty estimation viewpoint. We comprehensively evaluate 28 state-of-the-art models across 10 diverse 3D datasets, uncovering… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Preprint; 37 pages, 8 figures, 11 tables; Code at https://github.com/ldkong1205/Calib3D

  44. arXiv:2403.17009  [pdf, other

    cs.CV cs.RO

    Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions

    Authors: Ye Li, Lingdong Kong, Hanjiang Hu, Xiaohao Xu, Xiaonan Huang

    Abstract: The robustness of driving perception systems under unprecedented conditions is crucial for safety-critical usages. Latest advancements have prompted increasing interests towards multi-LiDAR perception. However, prevailing driving datasets predominantly utilize single-LiDAR systems and collect data devoid of adverse conditions, failing to capture the complexities of real-world environments accurate… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Preprint; 40 pages, 11 figures, 15 tables; Code at https://github.com/ywyeli/Place3D

  45. arXiv:2403.15696  [pdf, other

    cs.AI cs.CL

    MixRED: A Mix-lingual Relation Extraction Dataset

    Authors: Lingxing Kong, Yougang Chu, Zheng Ma, Jianbing Zhang, Liang He, Jiajun Chen

    Abstract: Relation extraction is a critical task in the field of natural language processing with numerous real-world applications. Existing research primarily focuses on monolingual relation extraction or cross-lingual enhancement for relation extraction. Yet, there remains a significant gap in understanding relation extraction in the mix-lingual (or code-switching) scenario, where individuals intermix con… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  46. arXiv:2403.14734  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

    Authors: Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, Xiaoli Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu

    Abstract: Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronol… ▽ More

    Submitted 23 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: 64 pages, 6 figures, 10 tables, 692 references

  47. arXiv:2403.12012  [pdf, other

    math.ST cs.LG math.NA math.PR stat.ML

    Convergence of Kinetic Langevin Monte Carlo on Lie groups

    Authors: Lingkai Kong, Molei Tao

    Abstract: Explicit, momentum-based dynamics for optimizing functions defined on Lie groups was recently constructed, based on techniques such as variational optimization and left trivialization. We appropriately add tractable noise to the optimization dynamics to turn it into a sampling dynamics, leveraging the advantageous feature that the trivialized momentum variable is Euclidean despite that the potenti… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  48. arXiv:2403.10001  [pdf, other

    cs.CV

    Visual Foundation Models Boost Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

    Authors: Jingyi Xu, Weidong Yang, Lingdong Kong, Youquan Liu, Rui Zhang, Qingyuan Zhou, Ben Fei

    Abstract: Unsupervised domain adaptation (UDA) is vital for alleviating the workload of labeling 3D point cloud data and mitigating the absence of labels when facing a newly defined domain. Various methods of utilizing images to enhance the performance of cross-domain 3D segmentation have recently emerged. However, the pseudo labels, which are generated from models trained on the source domain and provide a… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures

  49. arXiv:2403.02910  [pdf, other

    cs.CV cs.AI

    ImgTrojan: Jailbreaking Vision-Language Models with ONE Image

    Authors: Xijia Tao, Shuai Zhong, Lei Li, Qi Liu, Lingpeng Kong

    Abstract: There has been an increasing interest in the alignment of large language models (LLMs) with human values. However, the safety issues of their integration with a vision module, or vision language models (VLMs), remain relatively underexplored. In this paper, we propose a novel jailbreaking attack against VLMs, aiming to bypass their safety barrier when a user inputs harmful instructions. A scenario… ▽ More

    Submitted 5 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  50. arXiv:2403.00812  [pdf, other

    cs.CL cs.AI

    LoRA Meets Dropout under a Unified Framework

    Authors: Sheng Wang, Liheng Chen, Jiyue Jiang, Boyang Xue, Lingpeng Kong, Chuan Wu

    Abstract: With the remarkable capabilities, large language models (LLMs) have emerged as essential elements in numerous NLP applications, while parameter-efficient finetuning, especially LoRA, has gained popularity as a lightweight approach for model customization. Meanwhile, various dropout methods, initially designed for full finetuning with all the parameters updated, alleviates overfitting associated wi… ▽ More

    Submitted 26 May, 2024; v1 submitted 25 February, 2024; originally announced March 2024.