Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 2,208 results for author: Li, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03256  [pdf, other

    cs.CL cs.AI

    E2CL: Exploration-based Error Correction Learning for Embodied Agents

    Authors: Hanlin Wang, Chak Tou Leong, Jian Wang, Wenjie Li

    Abstract: Language models are exhibiting increasing capability in knowledge utilization and reasoning. However, when applied as agents in embodied environments, they often suffer from misalignment between their intrinsic knowledge and environmental knowledge, leading to infeasible actions. Traditional environment alignment methods, such as supervised learning on expert trajectories and reinforcement learnin… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2409.02943  [pdf, ps, other

    cs.DS cs.LG

    A Note On Deterministic Submodular Maximization With Bounded Curvature

    Authors: Wenxin Li

    Abstract: We show that the recent breakthrough result of [Buchbinder and Feldman, FOCS'24] could further lead to a deterministic $(1-κ_{f}/e-\varepsilon)$-approximate algorithm for maximizing a submodular function with curvature $κ_{f}$ under matroid constraint.

    Submitted 24 August, 2024; originally announced September 2024.

  3. arXiv:2409.02664  [pdf, other

    cs.CV

    Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

    Authors: Kaiqing Lin, Yuzhen Lin, Weixiang Li, Taiping Yao, Bin Li

    Abstract: The proliferation of deepfake faces poses huge potential negative impacts on our daily lives. Despite substantial advancements in deepfake detection over these years, the generalizability of existing methods against forgeries from unseen datasets or created by emerging generative models remains constrained. In this paper, inspired by the zero-shot advantages of Vision-Language Models (VLMs), we pr… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2409.02543  [pdf, other

    cs.CV

    StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

    Authors: Wen Li, Muyuan Fang, Cheng Zou, Biao Gong, Ruobing Zheng, Meng Wang, Jingdong Chen, Ming Yang

    Abstract: Despite the burst of innovative methods for controlling the diffusion process, effectively controlling image styles in text-to-image generation remains a challenging task. Many adapter-based methods impose image representation conditions on the denoising process to accomplish image control. However these conditions are not aligned with the word embedding space, leading to interference between imag… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV2024

  5. arXiv:2409.02045  [pdf, other

    cs.CV cs.AI

    AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions

    Authors: Chenghao Qian, Mahdi Rezaei, Saeed Anwar, Wenjing Li, Tanveer Hussain, Mohsen Azarmi, Wei Wang

    Abstract: Adverse conditions like snow, rain, nighttime, and fog, pose challenges for autonomous driving perception systems. Existing methods have limited effectiveness in improving essential computer vision tasks, such as semantic segmentation, and often focus on only one specific condition, such as removing rain or translating nighttime images into daytime ones. To address these limitations, we propose a… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  6. arXiv:2409.01781  [pdf, other

    cs.CV

    Dual Advancement of Representation Learning and Clustering for Sparse and Noisy Images

    Authors: Wenlin Li, Yucheng Xu, Xiaoqing Zheng, Suoya Han, Jun Wang, Xiaobo Sun

    Abstract: Sparse and noisy images (SNIs), like those in spatial gene expression data, pose significant challenges for effective representation learning and clustering, which are essential for thorough data analysis and interpretation. In response to these challenges, we propose Dual Advancement of Representation Learning and Clustering (DARLC), an innovative framework that leverages contrastive learning to… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  7. arXiv:2409.01691  [pdf, other

    cs.CV

    When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels

    Authors: Yifan Liu, Wuyang Li, Cheng Wang, Hui Chen, Yixuan Yuan

    Abstract: Tooth point cloud segmentation is a fundamental task in many orthodontic applications. Current research mainly focuses on fully supervised learning which demands expensive and tedious manual point-wise annotation. Although recent weakly-supervised alternatives are proposed to use weak labels for 3D segmentation and achieve promising results, they tend to fail when the labels are extremely sparse.… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: To appear at MICCAI24

  8. arXiv:2409.01641  [pdf, other

    cs.CV

    Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement

    Authors: Kun Zhou, Xinyu Lin, Wenbo Li, Xiaogang Xu, Yuanhao Cai, Zhonghang Liu, Xiaoguang Han, Jiangbo Lu

    Abstract: Previous low-light image enhancement (LLIE) approaches, while employing frequency decomposition techniques to address the intertwined challenges of low frequency (e.g., illumination recovery) and high frequency (e.g., noise reduction), primarily focused on the development of dedicated and complex networks to achieve improved performance. In contrast, we reveal that an advanced disentanglement para… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted to ECCV 2024, Github \url{https://github.com/redrock303/ADF-LLIE}

  9. arXiv:2409.01207  [pdf, other

    cs.LG

    Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models

    Authors: Jiao Chen, Jiayi He, Fangfang Chen, Zuohong Lv, Jianhua Tang, Weihua Li, Zuozhu Liu, Howard H. Yang, Guangjie Han

    Abstract: Currently, most applications in the Industrial Internet of Things (IIoT) still rely on CNN-based neural networks. Although Transformer-based large models (LMs), including language, vision, and multimodal models, have demonstrated impressive capabilities in AI-generated content (AIGC), their application in industrial domains, such as detection, planning, and control, remains relatively limited. Dep… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  10. arXiv:2409.00664  [pdf, other

    q-bio.NC cs.LG

    Video-based Analysis Reveals Atypical Social Gaze in People with Autism Spectrum Disorder

    Authors: Xiangxu Yu, Mindi Ruan, Chuanbo Hu, Wenqi Li, Lynn K. Paul, Xin Li, Shuo Wang

    Abstract: In this study, we present a quantitative and comprehensive analysis of social gaze in people with autism spectrum disorder (ASD). Diverging from traditional first-person camera perspectives based on eye-tracking technologies, this study utilizes a third-person perspective database from the Autism Diagnostic Observation Schedule, 2nd Edition (ADOS-2) interview videos, encompassing ASD participants… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  11. arXiv:2409.00591  [pdf, other

    cs.CV

    Attention-Guided Multi-scale Interaction Network for Face Super-Resolution

    Authors: Xujie Wan, Wenjie Li, Guangwei Gao, Huimin Lu, Jian Yang, Chia-Wen Lin

    Abstract: Recently, CNN and Transformer hybrid networks demonstrated excellent performance in face super-resolution (FSR) tasks. Since numerous features at different scales in hybrid networks, how to fuse these multi-scale features and promote their complementarity is crucial for enhancing FSR. However, existing hybrid network-based FSR methods ignore this, only simply combining the Transformer and CNN. To… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 12 pages, 8 figures, 8 tables

  12. arXiv:2409.00494  [pdf, other

    cs.AI cs.SE

    GenAI-powered Multi-Agent Paradigm for Smart Urban Mobility: Opportunities and Challenges for Integrating Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) with Intelligent Transportation Systems

    Authors: Haowen Xu, Jinghui Yuan, Anye Zhou, Guanhao Xu, Wan Li, Xuegang Ban, Xinyue Ye

    Abstract: Leveraging recent advances in generative AI, multi-agent systems are increasingly being developed to enhance the functionality and efficiency of smart city applications. This paper explores the transformative potential of large language models (LLMs) and emerging Retrieval-Augmented Generation (RAG) technologies in Intelligent Transportation Systems (ITS), paving the way for innovative solutions t… ▽ More

    Submitted 4 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

  13. arXiv:2409.00489  [pdf

    cs.CV cs.AI

    Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability

    Authors: Chia-Yu Hsu, Wenwen Li, Sizhe Wang

    Abstract: Research on geospatial foundation models (GFMs) has become a trending topic in geospatial artificial intelligence (AI) research due to their potential for achieving high generalizability and domain adaptability, reducing model training costs for individual researchers. Unlike large language models, such as ChatGPT, constructing visual foundation models for image analysis, particularly in remote se… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  14. BaseMirror: Automatic Reverse Engineering of Baseband Commands from Android's Radio Interface Layer

    Authors: Wenqiang Li, Haohuang Wen, Zhiqiang Lin

    Abstract: In modern mobile devices, baseband is an integral component running on top of cellular processors to handle crucial radio communications. However, recent research reveals significant vulnerabilities in these basebands, posing serious security risks like remote code execution. Yet, effectively scrutinizing basebands remains a daunting task, as they run closed-source and proprietary software on vend… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: This is the extended version of the CCS 2024 paper with the same title

    Journal ref: The ACM Conference on Computer and Communications Security (CCS) 2024

  15. arXiv:2408.17267  [pdf, other

    cs.CV cs.AI

    UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

    Authors: Baichuan Zhou, Haote Yang, Dairong Chen, Junyan Ye, Tianyi Bai, Jinhua Yu, Songyang Zhang, Dahua Lin, Conghui He, Weijia Li

    Abstract: Recent evaluations of Large Multimodal Models (LMMs) have explored their capabilities in various domains, with only few benchmarks specifically focusing on urban environments. Moreover, existing urban benchmarks have been limited to evaluating LMMs with basic region-level urban tasks under singular views, leading to incomplete evaluations of LMMs' abilities in urban environments. To address these… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 9 pages, 6 figures

  16. arXiv:2408.14765  [pdf, other

    cs.CV

    CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis

    Authors: Weijia Li, Jun He, Junyan Ye, Huaping Zhong, Zhimeng Zheng, Zilong Huang, Dahua Lin, Conghui He

    Abstract: Satellite-to-street view synthesis aims at generating a realistic street-view image from its corresponding satellite-view image. Although stable diffusion models have exhibit remarkable performance in a variety of image generation applications, their reliance on similar-view inputs to control the generated structure or texture restricts their application to the challenging cross-view synthesis tas… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 21 pages, 11 figures

  17. arXiv:2408.14600  [pdf, other

    cs.CV

    PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection

    Authors: Yidi Li, Jiahao Wen, Bin Ren, Wenhao Li, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: The integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. However, this combination often struggles with capturing semantic information effectively. Moreover, relying solely on point features within regions of interest can lead to information loss and limitations in local feature representation. To tackle these challenges, we propose a novel two… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 3D Object Detection

  18. arXiv:2408.13752  [pdf, other

    cs.CV

    Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation

    Authors: Zhaoyang Li, Yuan Wang, Wangkai Li, Rui Sun, Tianzhu Zhang

    Abstract: Point cloud few-shot semantic segmentation (PC-FSS) aims to segment targets of novel categories in a given query point cloud with only a few annotated support samples. The current top-performing prototypical learning methods employ prototypes originating from support samples to direct the classification of query points. However, the inherent fragility of point-level matching and the prevalent intr… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024

  19. arXiv:2408.13712  [pdf, other

    cs.CV cs.MM

    Riemann-based Multi-scale Attention Reasoning Network for Text-3D Retrieval

    Authors: Wenrui Li, Wei Han, Yandu Chen, Yeyu Chai, Yidan Lu, Xingtao Wang, Xiaopeng Fan

    Abstract: Due to the challenges in acquiring paired Text-3D data and the inherent irregularity of 3D data structures, combined representation learning of 3D point clouds and text remains unexplored. In this paper, we propose a novel Riemann-based Multi-scale Attention Reasoning Network (RMARN) for text-3D retrieval. Specifically, the extracted text and point cloud features are refined by their respective Ad… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  20. arXiv:2408.13711  [pdf, other

    cs.CV cs.MM

    SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting

    Authors: Wenrui Li, Yapeng Mi, Fucheng Cai, Zhe Yang, Wangmeng Zuo, Xingtao Wang, Xiaopeng Fan

    Abstract: Text-driven 3D scene generation has seen significant advancements recently. However, most existing methods generate single-view images using generative models and then stitch them together in 3D space. This independent generation for each view often results in spatial inconsistency and implausibility in the 3D scenes. To address this challenge, we proposed a novel text-driven 3D-consistent scene g… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  21. arXiv:2408.13423  [pdf, other

    cs.CV

    Training-free Long Video Generation with Chain of Diffusion Model Experts

    Authors: Wenhao Li, Yichao Cao, Xiu Su, Xi Lin, Shan You, Mingkai Zheng, Yi Chen, Chang Xu

    Abstract: Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{… ▽ More

    Submitted 2 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  22. arXiv:2408.13270  [pdf, other

    cs.AR cs.AI cs.LG

    Efficient Task Transfer for HLS DSE

    Authors: Zijian Ding, Atefeh Sohrabizadeh, Weikai Li, Zongyue Qin, Yizhou Sun, Jason Cong

    Abstract: There have been several recent works proposed to utilize model-based optimization methods to improve the productivity of using high-level synthesis (HLS) to design domain-specific architectures. They would replace the time-consuming performance estimation or simulation of design with a proxy model, and automatically insert pragmas to guide hardware optimizations. In this work, we address the chall… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 13 pages, 7 figures, accept to ICCAD'24

  23. arXiv:2408.12991  [pdf, other

    cs.CE q-fin.TR

    Controllable Financial Market Generation with Diffusion Guided Meta Agent

    Authors: Yu-Hao Huang, Chang Xu, Yang Liu, Weiqing Liu, Wu-Jun Li, Jiang Bian

    Abstract: Order flow modeling stands as the most fundamental and essential financial task, as orders embody the minimal unit within a financial market. However, current approaches often result in unsatisfactory fidelity in generating order flow, and their generation lacks controllability, thereby limiting their application scenario. In this paper, we advocate incorporating controllability into the market ge… ▽ More

    Submitted 1 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  24. Towards Deconfounded Image-Text Matching with Causal Inference

    Authors: Wenhui Li, Xinqi Su, Dan Song, Lanjun Wang, Kun Zhang, An-An Liu

    Abstract: Prior image-text matching methods have shown remarkable performance on many benchmark datasets, but most of them overlook the bias in the dataset, which exists in intra-modal and inter-modal, and tend to learn the spurious correlations that extremely degrade the generalization ability of the model. Furthermore, these methods often incorporate biased external knowledge from large-scale datasets as… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: ACM MM

    Journal ref: 2023/10/26,Proceedings of the 31st ACM International Conference on Multimedia,6264-6273

  25. arXiv:2408.12248  [pdf, other

    cs.CV

    PRG: Prompt-Based Distillation Without Annotation via Proxy Relational Graph

    Authors: Yijin Xu, Jialun Liu, Hualiang Wei, Wenhui Li

    Abstract: In this paper, we propose a new distillation method for extracting knowledge from Large Foundation Models (LFM) into lightweight models, introducing a novel supervision mode that does not require manually annotated data. While LFMs exhibit exceptional zero-shot classification abilities across datasets, relying solely on LFM-generated embeddings for distillation poses two main challenges: LFM's tas… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  26. arXiv:2408.12232  [pdf, other

    cs.CV

    BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking

    Authors: Hanzheng Wang, Wei Li, Xiang-Gen Xia, Qian Du

    Abstract: Hyperspectral object tracking (HOT) has exhibited potential in various applications, particularly in scenes where objects are camouflaged. Existing trackers can effectively retrieve objects via band regrouping because of the bias in existing HOT datasets, where most objects tend to have distinguishing visual appearances rather than spectral characteristics. This bias allows the tracker to directly… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  27. arXiv:2408.12153  [pdf, other

    cs.IR cs.LG

    DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models

    Authors: Wuchao Li, Rui Huang, Haijun Zhao, Chi Liu, Kai Zheng, Qi Liu, Na Mou, Guorui Zhou, Defu Lian, Yang Song, Wentian Bao, Enyun Yu, Wenwu Ou

    Abstract: Sequential Recommendation (SR) plays a pivotal role in recommender systems by tailoring recommendations to user preferences based on their non-stationary historical interactions. Achieving high-quality performance in SR requires attention to both item representation and diversity. However, designing an SR method that simultaneously optimizes these merits remains a long-standing challenge. In this… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  28. arXiv:2408.12093  [pdf, other

    cs.RO cs.CV

    LLM-enhanced Scene Graph Learning for Household Rearrangement

    Authors: Wenhao Li, Zhiyuan Yu, Qijin She, Zhinan Yu, Yuqing Lan, Chenyang Zhu, Ruizhen Hu, Kai Xu

    Abstract: The household rearrangement task involves spotting misplaced objects in a scene and accommodate them with proper places. It depends both on common-sense knowledge on the objective side and human user preference on the subjective side. In achieving such task, we propose to mine object functionality with user preference alignment directly from the scene itself, without relying on human intervention.… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: SIGGRAPH ASIA 2024

  29. arXiv:2408.11687  [pdf, other

    cs.CV

    Interpretable Long-term Action Quality Assessment

    Authors: Xu Dong, Xinran Liu, Wanqing Li, Anthony Adeyemi-Ejeye, Andrew Gilbert

    Abstract: Long-term Action Quality Assessment (AQA) evaluates the execution of activities in videos. However, the length presents challenges in fine-grained interpretability, with current AQA methods typically producing a single score by averaging clip features, lacking detailed semantic meanings of individual clips. Long-term videos pose additional difficulty due to the complexity and diversity of actions,… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Accepted to British Machine Vision Conference (BMVC) 2024

  30. arXiv:2408.11210  [pdf, other

    cs.CV

    A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation

    Authors: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Daguang Xu, Wenqi Li

    Abstract: Since the release of Segment Anything 2 (SAM2), the medical imaging community has been actively evaluating its performance for 3D medical image segmentation. However, different studies have employed varying evaluation pipelines, resulting in conflicting outcomes that obscure a clear understanding of SAM2's capabilities and potential applications. We shortly review existing benchmarks and point out… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  31. arXiv:2408.10883  [pdf, other

    cs.AI cs.CV

    DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection

    Authors: Xinqi Su, Yawen Cui, Ajian Liu, Xun Lin, Yuhao Wang, Haochen Liang, Wenhui Li, Zitong Yu

    Abstract: In current web environment, fake news spreads rapidly across online social networks, posing serious threats to society. Existing multimodal fake news detection (MFND) methods can be classified into knowledge-based and semantic-based approaches. However, these methods are overly dependent on human expertise and feedback, lacking flexibility. To address this challenge, we propose a Dynamic Analysis… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  32. arXiv:2408.10854  [pdf, other

    physics.ao-ph cs.AI cs.CV

    MambaDS: Near-Surface Meteorological Field Downscaling with Topography Constrained Selective State Space Modeling

    Authors: Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

    Abstract: In an era of frequent extreme weather and global warming, obtaining precise, fine-grained near-surface weather forecasts is increasingly essential for human activities. Downscaling (DS), a crucial task in meteorological forecasting, enables the reconstruction of high-resolution meteorological states for target regions from global-scale forecast results. Previous downscaling methods, inspired by CN… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  33. arXiv:2408.10537  [pdf, other

    cs.CV

    Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation

    Authors: Jiawei Han, Kaiqi Liu, Wei Li, Guangzhi Chen

    Abstract: Point cloud semantic segmentation can significantly enhance the perception of an intelligent agent. Nevertheless, the discriminative capability of the segmentation network is influenced by the quantity of samples available for different categories. To mitigate the cognitive bias induced by class imbalance, this paper introduces a novel method, namely subspace prototype guidance (\textbf{SPG}), to… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  34. arXiv:2408.09885  [pdf, other

    cs.GT

    Joint Auction in the Online Advertising Market

    Authors: Zhen Zhang, Weian Li, Yahui Lei, Bingzhe Wang, Zhicheng Zhang, Qi Qi, Qiang Liu, Xingxing Wang

    Abstract: Online advertising is a primary source of income for e-commerce platforms. In the current advertising pattern, the oriented targets are the online store owners who are willing to pay extra fees to enhance the position of their stores. On the other hand, brand suppliers are also desirable to advertise their products in stores to boost brand sales. However, the currently used advertising mode cannot… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  35. arXiv:2408.09722  [pdf, other

    cs.LG stat.ML

    Towards Few-Shot Learning in the Open World: A Review and Beyond

    Authors: Hui Xue, Yuexuan An, Yongchun Qin, Wenqian Li, Yixin Wu, Yongjuan Che, Pengfei Fang, Minling Zhang

    Abstract: Human intelligence is characterized by our ability to absorb and apply knowledge from the world around us, especially in rapidly acquiring new concepts from minimal examples, underpinned by prior knowledge. Few-shot learning (FSL) aims to mimic this capacity by enabling significant generalizations and transferability. However, traditional FSL frameworks often rely on assumptions of clean, complete… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  36. arXiv:2408.09460  [pdf, other

    cs.CV

    Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning

    Authors: Weijia Li, Jinhua Yu, Dairong Chen, Yi Lin, Runmin Dong, Xiang Zhang, Conghui He, Haohuan Fu

    Abstract: In this work, we propose a geometry-aware semi-supervised method for fine-grained building function recognition. This method leverages the geometric relationships between multi-source data to improve the accuracy of pseudo labels in semi-supervised learning, extending the task's scope and making it applicable to cross-categorization systems of building function recognition. Firstly, we design an o… ▽ More

    Submitted 27 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: This paper is currently under review

  37. arXiv:2408.09356  [pdf, other

    cs.CV

    Joint Temporal Pooling for Improving Skeleton-based Action Recognition

    Authors: Shanaka Ramesh Gunasekara, Wanqing Li, Jack Yang, Philip Ogunbona

    Abstract: In skeleton-based human action recognition, temporal pooling is a critical step for capturing spatiotemporal relationship of joint dynamics. Conventional pooling methods overlook the preservation of motion information and treat each frame equally. However, in an action sequence, only a few segments of frames carry discriminative information related to the action. This paper presents a novel Joint… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Journal ref: 2023 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2023

  38. arXiv:2408.09345  [pdf, other

    cs.IR cs.SE

    Deep Code Search with Naming-Agnostic Contrastive Multi-View Learning

    Authors: Jiadong Feng, Wei Li, Zhao Wei, Yong Xu, Juhong Wang, Hui Li

    Abstract: Software development is a repetitive task, as developers usually reuse or get inspiration from existing implementations. Code search, which refers to the retrieval of relevant code snippets from a codebase according to the developer's intent that has been expressed as a query, has become increasingly important in the software development process. Due to the success of deep learning in various appl… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  39. arXiv:2408.09191  [pdf, other

    cs.CV

    GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System

    Authors: Shuo Wang, Yongcai Wang, Zhimin Xu, Yongyu Guo, Wanting Li, Zhe Huang, Xuewei Bai, Deying Li

    Abstract: For interacting with mobile objects in unfamiliar environments, simultaneously locating, mapping, and tracking the 3D poses of multiple objects are crucially required. This paper proposes a Tracklet Graph and Query Graph-based framework, i.e., GSLAMOT, to address this challenge. GSLAMOT utilizes camera and LiDAR multimodal information as inputs and divides the representation of the dynamic scene i… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 11 pages, 9 figures, ACM MM 2024

  40. arXiv:2408.09064  [pdf, other

    cs.CV cs.LG

    MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

    Authors: Zhiyi Shi, Junsik Kim, Wanhua Li, Yicong Li, Hanspeter Pfister

    Abstract: Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted by MICCAI 2024

  41. arXiv:2408.08870  [pdf, other

    cs.CV

    SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation

    Authors: Xinyu Xiong, Zihuang Wu, Shuangyi Tan, Wenxue Li, Feilong Tang, Ying Chen, Siying Li, Jie Ma, Guanbin Li

    Abstract: Image segmentation plays an important role in vision understanding. Recently, the emerging vision foundation models continuously achieved superior performance on various tasks. Following such success, in this paper, we prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models. We propose a simple but effective framework, termed SAM2-UNet, for versatile… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Technical Report

  42. arXiv:2408.08824  [pdf, other

    cs.LG

    LEVIS: Large Exact Verifiable Input Spaces for Neural Networks

    Authors: Mohamad Fares El Hajj Chehade, Brian Wesley Bell, Russell Bent, Hao Zhu, Wenting Li

    Abstract: The robustness of neural networks is paramount in safety-critical applications. While most current robustness verification methods assess the worst-case output under the assumption that the input space is known, identifying a verifiable input space $\mathcal{C}$, where no adversarial examples exist, is crucial for effective model selection, robustness evaluation, and the development of reliable co… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  43. arXiv:2408.08665  [pdf, other

    cs.CV

    QMambaBSR: Burst Image Super-Resolution with Query State Space Model

    Authors: Xin Di, Long Peng, Peizhe Xia, Wenbo Li, Renjing Pei, Yang Cao, Yang Wang, Zheng-Jun Zha

    Abstract: Burst super-resolution aims to reconstruct high-resolution images with higher quality and richer details by fusing the sub-pixel information from multiple burst low-resolution frames. In BusrtSR, the key challenge lies in extracting the base frame's content complementary sub-pixel details while simultaneously suppressing high-frequency noise disturbance. Existing methods attempt to extract sub-pix… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  44. arXiv:2408.08490  [pdf, other

    cs.AR

    Accelerating Mini-batch HGNN Training by Reducing CUDA Kernels

    Authors: Meng Wu, Jingkai Qiu, Mingyu Yan, Wenming Li, Yang Zhang, Zhimin Zhang, Xiaochun Ye, Dongrui Fan

    Abstract: Heterogeneous graph neural networks (HGNNs) are essential for capturing the structure and semantic information in heterogeneous graphs. However, existing GPU-based solutions, such as PyTorch Geometric, suffer from low GPU utilization due to numerous short-execution-time and memory-bound CUDA kernels during HGNN training. To address this issue, we introduce HiFuse, an enhancement for PyTorch Geom… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  45. arXiv:2408.07999  [pdf, other

    cs.CV

    Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement

    Authors: Wenxuan Li, Qin Zou, Chi Chen, Bo Du, Long Chen

    Abstract: In the realm of autonomous driving,accurately detecting occluded or distant objects,referred to as weak positive sample ,presents significant challenges. These challenges predominantly arise during query initialization, where an over-reliance on heatmap confidence often results in a high rate of false positives, consequently masking weaker detections and impairing system performance. To alleviate… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  46. arXiv:2408.07719  [pdf, other

    cs.LG cs.AI

    Operator Feature Neural Network for Symbolic Regression

    Authors: Yusong Deng, Min Wu, Lina Yu, Jingyi Liu, Shu Wei, Yanjie Li, Weijun Li

    Abstract: Symbolic regression is a task aimed at identifying patterns in data and representing them through mathematical expressions, generally involving skeleton prediction and constant optimization. Many methods have achieved some success, however they treat variables and symbols merely as characters of natural language without considering their mathematical essence. This paper introduces the operator fea… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 12 pages

  47. arXiv:2408.07427  [pdf, other

    cs.IR

    Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation

    Authors: CanYi Liu, Wei Li, Youchen, Zhang, Hui Li, Rongrong Ji

    Abstract: Sequential recommender system (SRS) predicts the next items that users may prefer based on user historical interaction sequences. Inspired by the rise of large language models (LLMs) in various AI applications, there is a surge of work on LLM-based SRS. Despite their attractive performance, existing LLM-based SRS still exhibit some limitations, including neglecting intra-item relations, ignoring l… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 11 pages, 14 figures

  48. arXiv:2408.07278  [pdf, other

    cs.IR cs.AI cs.CV

    Scene-wise Adaptive Network for Dynamic Cold-start Scenes Optimization in CTR Prediction

    Authors: Wenhao Li, Jie Zhou, Chuan Luo, Chao Tang, Kun Zhang, Shixiong Zhao

    Abstract: In the realm of modern mobile E-commerce, providing users with nearby commercial service recommendations through location-based online services has become increasingly vital. While machine learning approaches have shown promise in multi-scene recommendation, existing methodologies often struggle to address cold-start problems in unprecedented scenes: the increasing diversity of commercial choices,… ▽ More

    Submitted 18 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: 10 pages, 6 figures, accepted by Recsys 2024

    MSC Class: 68T09 ACM Class: I.2.0

  49. arXiv:2408.07246  [pdf, other

    cs.LG cs.CV

    ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

    Authors: Junxian Li, Di Zhang, Xunzhi Wang, Zeying Hao, Jingdi Lei, Qian Tan, Cai Zhou, Wei Liu, Yaotian Yang, Xinrui Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Wei Li, Shufei Zhang, Mao Su, Wanli Ouyang, Yuqiang Li, Dongzhan Zhou

    Abstract: Large Language Models (LLMs) have achieved remarkable success and have been applied across various scientific fields, including chemistry. However, many chemical tasks require the processing of visual information, which cannot be successfully handled by existing chemical LLMs. This brings a growing need for models capable of integrating multimodal information in the chemical domain. In this paper,… ▽ More

    Submitted 16 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 11 pages, updated version

  50. arXiv:2408.06941  [pdf, other

    cs.IR

    OpenResearcher: Unleashing AI for Accelerated Scientific Research

    Authors: Yuxiang Zheng, Shichao Sun, Lin Qiu, Dongyu Ru, Cheng Jiayang, Xuefeng Li, Jifan Lin, Binjie Wang, Yun Luo, Renjie Pan, Yang Xu, Qingkai Min, Zizhao Zhang, Yiwen Wang, Wenjie Li, Pengfei Liu

    Abstract: The rapid growth of scientific literature imposes significant challenges for researchers endeavoring to stay updated with the latest advancements in their fields and delve into new areas. We introduce OpenResearcher, an innovative platform that leverages Artificial Intelligence (AI) techniques to accelerate the research process by answering diverse questions from researchers. OpenResearcher is bui… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.