Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 281 results for author: Cao, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.18092  [pdf, other

    cs.CV cs.AI cs.RO

    DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models

    Authors: Helin Cao, Sven Behnke

    Abstract: Perception systems play a crucial role in autonomous driving, incorporating multiple sensors and corresponding computer vision algorithms. 3D LiDAR sensors are widely used to capture sparse point clouds of the vehicle's surroundings. However, such systems struggle to perceive occluded areas and gaps in the scene due to the sparsity of these point clouds and their lack of semantics. To address thes… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: Under review

  2. arXiv:2409.16788  [pdf, other

    cs.CL

    Mitigating the Bias of Large Language Model Evaluation

    Authors: Hongli Zhou, Hui Huang, Yunfei Long, Bing Xu, Conghui Zhu, Hailong Cao, Muyun Yang, Tiejun Zhao

    Abstract: Recently, there has been a trend of evaluating the Large Language Model (LLM) quality in the flavor of LLM-as-a-Judge, namely leveraging another LLM to evaluate the current output quality. However, existing judges are proven to be biased, namely they would favor answers which present better superficial quality (such as verbosity, fluency) while ignoring the instruction following ability. In this w… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  3. arXiv:2409.16019  [pdf, other

    cs.RO

    AIR-Embodied: An Efficient Active 3DGS-based Interaction and Reconstruction Framework with Embodied Large Language Model

    Authors: Zhenghao Qi, Shenghai Yuan, Fen Liu, Haozhi Cao, Tianchen Deng, Jianfei Yang, Lihua Xie

    Abstract: Recent advancements in 3D reconstruction and neural rendering have enhanced the creation of high-quality digital assets, yet existing methods struggle to generalize across varying object shapes, textures, and occlusions. While Next Best View (NBV) planning and Learning-based approaches offer solutions, they are often limited by predefined criteria and fail to manage occlusions with human-like comm… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  4. arXiv:2409.15243  [pdf, other

    cs.AI cs.ET cs.HC

    MACeIP: A Multimodal Ambient Context-enriched Intelligence Platform in Smart Cities

    Authors: Truong Thanh Hung Nguyen, Phuc Truong Loc Nguyen, Monica Wachowicz, Hung Cao

    Abstract: This paper presents a Multimodal Ambient Context-enriched Intelligence Platform (MACeIP) for Smart Cities, a comprehensive system designed to enhance urban management and citizen engagement. Our platform integrates advanced technologies, including Internet of Things (IoT) sensors, edge and cloud computing, and Multimodal AI, to create a responsive and intelligent urban ecosystem. Key components in… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: 4 pages, 6 figures, IEEE/IEIE ICCE-Asia 2024

  5. arXiv:2409.05916  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.BM

    Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries

    Authors: Chunbin Gu, Mutian He, Hanqun Cao, Guangyong Chen, Chang-yu Hsieh, Pheng Ann Heng

    Abstract: In the realm of drug discovery, DNA-encoded library (DEL) screening technology has emerged as an efficient method for identifying high-affinity compounds. However, DEL screening faces a significant challenge: noise arising from nonspecific interactions within complex biological systems. Neural networks trained on DEL libraries have been employed to extract compound features, aiming to denoise the… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  6. arXiv:2409.05898  [pdf, other

    cs.LG cs.AI cs.RO

    Simplex-enabled Safe Continual Learning Machine

    Authors: Yihao Cai, Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo

    Abstract: This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Co… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  7. arXiv:2409.04133  [pdf, other

    cs.CV cs.CY

    Secure Traffic Sign Recognition: An Attention-Enabled Universal Image Inpainting Mechanism against Light Patch Attacks

    Authors: Hangcheng Cao, Longzhi Yuan, Guowen Xu, Ziyang He, Zhengru Fang, Yuguang Fang

    Abstract: Traffic sign recognition systems play a crucial role in assisting drivers to make informed decisions while driving. However, due to the heavy reliance on deep learning technologies, particularly for future connected and autonomous driving, these systems are susceptible to adversarial attacks that pose significant safety risks to both personal and public transportation. Notably, researchers recentl… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  8. arXiv:2409.00671  [pdf, other

    cs.CE

    InvariantStock: Learning Invariant Features for Mastering the Shifting Market

    Authors: Haiyao Cao, Jinan Zou, Yuhang Liu, Zhen Zhang, Ehsan Abbasnejad, Anton van den Hengel, Javen Qinfeng Shi

    Abstract: Accurately predicting stock returns is crucial for effective portfolio management. However, existing methods often overlook a fundamental issue in the market, namely, distribution shifts, making them less practical for predicting future markets or newly listed stocks. This study introduces a novel approach to address this challenge by focusing on the acquisition of invariant features across variou… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  9. arXiv:2408.16231  [pdf

    physics.optics cs.AI physics.app-ph

    Anchor-Controlled Generative Adversarial Network for High-Fidelity Electromagnetic and Structurally Diverse Metasurface Design

    Authors: Yunhui Zeng, Hongkun Cao, Xin Jin

    Abstract: In optoelectronics, designing free-form metasurfaces presents significant challenges, particularly in achieving high electromagnetic response fidelity due to the complex relationship between physical structures and electromagnetic behaviors. A key difficulty arises from the one-to-many mapping dilemma, where multiple distinct physical structures can yield similar electromagnetic responses, complic… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  10. arXiv:2408.14001  [pdf, other

    cs.LG cs.DC

    Decentralized Federated Learning with Model Caching on Mobile Agents

    Authors: Xiaoyu Wang, Guojun Xiong, Houwei Cao, Jian Li, Yong Liu

    Abstract: Federated Learning (FL) aims to train a shared model using data and computation power on distributed agents coordinated by a central server. Decentralized FL (DFL) utilizes local model exchange and aggregation between agents to reduce the communication and computation overheads on the central server. However, when agents are mobile, the communication opportunity between agents can be sporadic, lar… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 27 pages

  11. arXiv:2408.13498  [pdf, other

    cs.LG

    Rethinking State Disentanglement in Causal Reinforcement Learning

    Authors: Haiyao Cao, Zhen Zhang, Panpan Cai, Yuhang Liu, Jinan Zou, Ehsan Abbasnejad, Biwei Huang, Mingming Gong, Anton van den Hengel, Javen Qinfeng Shi

    Abstract: One of the significant challenges in reinforcement learning (RL) when dealing with noise is estimating latent states from observations. Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability. Consequently, some existing work focuses on establishing identifiability from a causal perspective to aid in the design of al… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  12. arXiv:2408.09851  [pdf, other

    cs.NI eess.SY

    ISAC-Fi: Enabling Full-fledged Monostatic Sensing over Wi-Fi Communication

    Authors: Zhe Chen, Chao Hu, Tianyue Zheng, Hangcheng Cao, Yanbing Yang, Yen Chu, Hongbo Jiang, Jun Luo

    Abstract: Whereas Wi-Fi communications have been exploited for sensing purpose for over a decade, the bistatic or multistatic nature of Wi-Fi still poses multiple challenges, hampering real-life deployment of integrated sensing and communication (ISAC) within Wi-Fi framework. In this paper, we aim to re-design WiFi so that monostatic sensing (mimicking radar) can be achieved over the multistatic communicati… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 14 pages, 22 figures

  13. arXiv:2407.15569  [pdf, other

    cs.CL

    An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought

    Authors: Yuetong Zhao, Hongyu Cao, Xianyu Zhao, Zhijian Ou

    Abstract: Since the launch of ChatGPT at the end of 2022, generative dialogue models represented by ChatGPT have quickly become essential tools in daily life. As user expectations increase, enhancing the capability of generative dialogue models to solve complex problems has become a focal point of current research. This paper delves into the effectiveness of the RAFT (Retrieval Augmented Fine-Tuning) method… ▽ More

    Submitted 30 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted by ISCSLP 2024

  14. arXiv:2407.14245  [pdf, other

    cs.CV

    Dataset Distillation by Automatic Training Trajectories

    Authors: Dai Liu, Jindong Gu, Hu Cao, Carsten Trinitis, Martin Schulz

    Abstract: Dataset Distillation is used to create a concise, yet informative, synthetic dataset that can replace the original dataset for training purposes. Some leading methods in this domain prioritize long-range matching, involving the unrolling of training trajectories with a fixed number of steps (NS) on the synthetic dataset to align with various expert training trajectories. However, traditional long-… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: The paper is accepted at ECCV 2024

  15. arXiv:2407.12582  [pdf, other

    cs.CV cs.AI cs.RO

    Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection

    Authors: Hu Cao, Zehua Zhang, Yan Xia, Xinyi Li, Jiahao Xia, Guang Chen, Alois Knoll

    Abstract: In frame-based vision, object detection faces substantial performance degradation under challenging conditions due to the limited sensing capability of conventional cameras. Event cameras output sparse and asynchronous events, providing a potential solution to solve these problems. However, effectively fusing two heterogeneous modalities remains an open issue. In this work, we propose a novel hier… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  16. arXiv:2407.11771  [pdf, other

    cs.CV cs.AI cs.LG

    XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach

    Authors: Truong Thanh Hung Nguyen, Phuc Truong Loc Nguyen, Hung Cao

    Abstract: Recent advancements in deep learning have significantly improved visual quality inspection and predictive maintenance within industrial settings. However, deploying these technologies on low-resource edge devices poses substantial challenges due to their high computational demands and the inherent complexity of Explainable AI (XAI) methods. This paper addresses these challenges by introducing a no… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 28 pages, preprint submitted to Information Fusion journal

  17. arXiv:2407.10474  [pdf, other

    cs.MM

    Multi-source Knowledge Enhanced Graph Attention Networks for Multimodal Fact Verification

    Authors: Han Cao, Lingwei Wei, Wei Zhou, Songlin Hu

    Abstract: Multimodal fact verification is an under-explored and emerging field that has gained increasing attention in recent years. The goal is to assess the veracity of claims that involve multiple modalities by analyzing the retrieved evidence. The main challenge in this area is to effectively fuse features from different modalities to learn meaningful multimodal representations. To this end, we propose… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ICME 2024

  18. arXiv:2407.09242  [pdf, other

    cs.RO cs.ET cs.NI

    An Adaptive Indoor Localization Approach Using WiFi RSSI Fingerprinting with SLAM-Enabled Robotic Platform and Deep Neural Networks

    Authors: Seyed Alireza Rahimi Azghadi, Atah Nuh Mih, Asfia Kawnine, Monica Wachowicz, Francis Palma, Hung Cao

    Abstract: Indoor localization plays a vital role in the era of the IoT and robotics, with WiFi technology being a prominent choice due to its ubiquity. We present a method for creating WiFi fingerprinting datasets to enhance indoor localization systems and address the gap in WiFi fingerprinting dataset creation. We used the Simultaneous Localization And Mapping (SLAM) algorithm and employed a robotic platfo… ▽ More

    Submitted 30 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: Fingerprinting dataset; Robotic platform; Indoor localization; Signals strength indicator; Location-based services

  19. arXiv:2407.04315  [pdf, other

    cs.RO

    Gradient-based Regularization for Action Smoothness in Robotic Control with Reinforcement Learning

    Authors: I Lee, Hoang-Giang Cao, Cong-Tinh Dao, Yu-Cheng Chen, I-Chen Wu

    Abstract: Deep Reinforcement Learning (DRL) has achieved remarkable success, ranging from complex computer games to real-world applications, showing the potential for intelligent agents capable of learning in dynamic environments. However, its application in real-world scenarios presents challenges, including the jerky problem, in which jerky trajectories not only compromise system safety but also increase… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024

  20. arXiv:2407.00662  [pdf, other

    cs.MA cs.AI

    Multi-Agent Training for Pommerman: Curriculum Learning and Population-based Self-Play Approach

    Authors: Nhat-Minh Huynh, Hoang-Giang Cao, I-Chen Wu

    Abstract: Pommerman is a multi-agent environment that has received considerable attention from researchers in recent years. This environment is an ideal benchmark for multi-agent training, providing a battleground for two teams with communication capabilities among allied agents. Pommerman presents significant challenges for model-free reinforcement learning due to delayed action effects, sparse rewards, an… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted at The First Workshop on Game AI Algorithms and Multi-Agent Learning - IJCAI 2024

  21. arXiv:2406.15486  [pdf, other

    cs.CL cs.AI cs.LG

    SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

    Authors: Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang

    Abstract: Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for nea… ▽ More

    Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  22. arXiv:2406.13672  [pdf, other

    cs.CV

    Q-SNNs: Quantized Spiking Neural Networks

    Authors: Wenjie Wei, Yu Liang, Ammar Belatreche, Yichen Xiao, Honglin Cao, Zhenbang Ren, Guoqing Wang, Malu Zhang, Yang Yang

    Abstract: Brain-inspired Spiking Neural Networks (SNNs) leverage sparse spikes to represent information and process them in an asynchronous event-driven manner, offering an energy-efficient paradigm for the next generation of machine intelligence. However, the current focus within the SNN community prioritizes accuracy optimization through the development of large-scale models, limiting their viability in r… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  23. arXiv:2406.13193  [pdf, other

    cs.LG cs.AI cs.CL physics.chem-ph

    PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

    Authors: He Cao, Yanjun Shao, Zhiyuan Liu, Zijing Liu, Xiangru Tang, Yuan Yao, Yu Li

    Abstract: Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines. These advancements encourage the investigation of molecule-text modeling within synthetic chemistry, a field dedicated to designing and conducting chemical reactions to synthesize new compounds with desired properties and applications. Current approaches, however, often neglect the critical r… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  24. arXiv:2406.12707  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

    Authors: Haoqiu Yan, Yongxin Zhu, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli Xu

    Abstract: Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 9 pages, 3 figures, ACL24 accepted

  25. arXiv:2406.07168  [pdf, other

    cs.CL

    Teaching Language Models to Self-Improve by Learning from Language Feedback

    Authors: Chi Hu, Yimin Hu, Hang Cao, Tong Xiao, Jingbo Zhu

    Abstract: Aligning Large Language Models (LLMs) with human intentions and values is crucial yet challenging. Current methods primarily rely on human preferences, which are costly and insufficient in capturing nuanced feedback expressed in natural language. In this paper, we present Self-Refinement Tuning (SRT), a method that leverages model feedback for alignment, thereby reducing reliance on human annotati… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  26. arXiv:2406.01607  [pdf, other

    cs.IR cs.AI cs.CL

    Recent advances in text embedding: A Comprehensive Review of Top-Performing Methods on the MTEB Benchmark

    Authors: Hongliu Cao

    Abstract: Text embedding methods have become increasingly popular in both industrial and academic fields due to their critical role in a variety of natural language processing tasks. The significance of universal text embeddings has been further highlighted with the rise of Large Language Models (LLMs) applications such as Retrieval-Augmented Systems (RAGs). While previous models have attempted to be genera… ▽ More

    Submitted 19 June, 2024; v1 submitted 27 May, 2024; originally announced June 2024.

    Comments: 21 pages

  27. arXiv:2406.00356  [pdf, other

    eess.AS cs.SD

    AudioLCM: Text-to-Audio Generation with Latent Consistency Models

    Authors: Huadai Liu, Rongjie Huang, Yang Liu, Hengyuan Cao, Jialei Wang, Xize Cheng, Siqi Zheng, Zhou Zhao

    Abstract: Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the forefront of various generative tasks. However, their iterative sampling process poses a significant computational burden, resulting in slow generation speeds and limiting their application in text-to-audio generation deployment. In this work, we introduce AudioLCM, a novel consistency-based model tailored for efficie… ▽ More

    Submitted 9 July, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  28. arXiv:2405.19688  [pdf, other

    cs.CV

    DNPM: A Neural Parametric Model for the Synthesis of Facial Geometric Details

    Authors: Haitao Cao, Baoping Cheng, Qiran Pu, Haocheng Zhang, Bin Luo, Yixiang Zhuang, Juncong Lin, Liyan Chen, Xuan Cheng

    Abstract: Parametric 3D models have enabled a wide variety of computer vision and graphics tasks, such as modeling human faces, bodies and hands. In 3D face modeling, 3DMM is the most widely used parametric model, but can't generate fine geometric details solely from identity and expression inputs. To tackle this limitation, we propose a neural parametric model named DNPM for the facial geometric details, w… ▽ More

    Submitted 13 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  29. arXiv:2405.15975  [pdf, other

    math.OC cs.LG q-fin.CP

    Inference of Utilities and Time Preference in Sequential Decision-Making

    Authors: Haoyang Cao, Zhengqi Wu, Renyuan Xu

    Abstract: This paper introduces a novel stochastic control framework to enhance the capabilities of automated investment managers, or robo-advisors, by accurately inferring clients' investment preferences from past activities. Our approach leverages a continuous-time model that incorporates utility functions and a generic discounting scheme of a time-varying rate, tailored to each client's risk tolerance, v… ▽ More

    Submitted 3 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  30. arXiv:2405.09886  [pdf

    cs.LG cs.AI q-bio.BM

    MTLComb: multi-task learning combining regression and classification tasks for joint feature selection

    Authors: Han Cao, Sivanesan Rajan, Bianka Hahn, Ersoy Kocak, Daniel Durstewitz, Emanuel Schwarz, Verena Schneider-Lindner

    Abstract: Multi-task learning (MTL) is a learning paradigm that enables the simultaneous training of multiple communicating algorithms. Although MTL has been successfully applied to ether regression or classification tasks alone, incorporating mixed types of tasks into a unified MTL framework remains challenging, primarily due to variations in the magnitudes of losses associated with different tasks. This c… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 33 pages, 3 figures, 5 tables

    ACM Class: J.3; I.2.6

  31. arXiv:2405.09880  [pdf, other

    cs.CV

    Deep Learning-Based Quasi-Conformal Surface Registration for Partial 3D Faces Applied to Facial Recognition

    Authors: Yuchen Guo, Hanqun Cao, Lok Ming Lui

    Abstract: 3D face registration is an important process in which a 3D face model is aligned and mapped to a template face. However, the task of 3D face registration becomes particularly challenging when dealing with partial face data, where only limited facial information is available. To address this challenge, this paper presents a novel deep learning-based approach that combines quasi-conformal geometry w… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    MSC Class: 68T10; 65D18

  32. arXiv:2405.06895  [pdf, other

    cs.HC

    Unveiling the Era of Spatial Computing

    Authors: Hanzhong Cao

    Abstract: The evolution of User Interfaces marks a significant transition from traditional command-line interfaces to more intuitive graphical and touch-based interfaces, largely driven by the emergence of personal computing devices. The advent of spatial computing and Extended Reality technologies further pushes the boundaries, promising a fusion of physical and digital realms through interactive environme… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  33. arXiv:2405.01507  [pdf, other

    cs.LG stat.ML

    Accelerating Convergence in Bayesian Few-Shot Classification

    Authors: Tianjun Ke, Haoqun Cao, Feng Zhou

    Abstract: Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent directi… ▽ More

    Submitted 7 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  34. arXiv:2404.18331  [pdf, other

    cs.RO

    Multi-Robot Object SLAM Using Distributed Variational Inference

    Authors: Hanwen Cao, Sriram Shreedharan, Nikolay Atanasov

    Abstract: Multi-robot simultaneous localization and mapping (SLAM) enables a robot team to achieve coordinated tasks by relying on a common map of the environment. Constructing a map by centralized processing of the robot observations is undesirable because it creates a single point of failure and requires pre-existing infrastructure and significant communication throughput. This paper formulates multi-robo… ▽ More

    Submitted 20 August, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  35. arXiv:2404.15587  [pdf, other

    cs.CR

    Security Analysis of WiFi-based Sensing Systems: Threats from Perturbation Attacks

    Authors: Hangcheng Cao, Wenbin Huang, Guowen Xu, Xianhao Chen, Ziyang He, Jingyang Hu, Hongbo Jiang, Yuguang Fang

    Abstract: Deep learning technologies are pivotal in enhancing the performance of WiFi-based wireless sensing systems. However, they are inherently vulnerable to adversarial perturbation attacks, and regrettably, there is lacking serious attention to this security issue within the WiFi sensing community. In this paper, we elaborate such an attack, called WiIntruder, distinguishing itself with universality, r… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  36. arXiv:2404.13417  [pdf, other

    cs.CV cs.AI

    Efficient and Concise Explanations for Object Detection with Gaussian-Class Activation Mapping Explainer

    Authors: Quoc Khanh Nguyen, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Van Binh Truong, Tuong Phan, Hung Cao

    Abstract: To address the challenges of providing quick and plausible explanations in Explainable AI (XAI) for object detection models, we introduce the Gaussian Class Activation Mapping Explainer (G-CAME). Our method efficiently generates concise saliency maps by utilizing activation maps from selected layers and applying a Gaussian kernel to emphasize critical image regions for the predicted object. Compar… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: Canadian AI 2024

  37. arXiv:2404.06918  [pdf, other

    cs.CV

    HRVDA: High-Resolution Visual Document Assistant

    Authors: Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Linli Xu

    Abstract: Leveraging vast training data, multimodal large language models (MLLMs) have demonstrated formidable general visual comprehension capabilities and achieved remarkable performance across various tasks. However, their performance in visual document understanding still leaves much room for improvement. This discrepancy is primarily attributed to the fact that visual document understanding is a fine-g… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024 main conference

  38. arXiv:2404.06891  [pdf, other

    cs.NI

    PACP: Priority-Aware Collaborative Perception for Connected and Autonomous Vehicles

    Authors: Zhengru Fang, Senkang Hu, Haonan An, Yuang Zhang, Jingjing Wang, Hangcheng Cao, Xianhao Chen, Yuguang Fang

    Abstract: Surrounding perceptions are quintessential for safe driving for connected and autonomous vehicles (CAVs), where the Bird's Eye View has been employed to accurately capture spatial relationships among vehicles. However, severe inherent limitations of BEV, like blind spots, have been identified. Collaborative perception has emerged as an effective solution to overcoming these limitations through dat… ▽ More

    Submitted 21 August, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Our work was accepted by IEEE Transactions on Mobile Computing

  39. arXiv:2404.06227  [pdf

    cs.HC

    Multimodal Road Network Generation Based on Large Language Model

    Authors: Jiajing Chen, Weihang Xu, Haiming Cao, Zihuan Xu, Yu Zhang, Zhao Zhang, Siyao Zhang

    Abstract: With the increasing popularity of ChatGPT, large language models (LLMs) have demonstrated their capabilities in communication and reasoning, promising for transportation sector intelligentization. However, they still face challenges in domain-specific knowledge. This paper aims to leverage LLMs' reasoning and recognition abilities to replace traditional user interfaces and create an "intelligent o… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 17 pages, 8 figures

  40. arXiv:2404.01268  [pdf, other

    cs.CL cs.AI cs.DL cs.LG cs.SI

    Mapping the Increasing Use of LLMs in Scientific Papers

    Authors: Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou

    Abstract: Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  41. arXiv:2404.00364  [pdf, other

    cs.RO cs.AI

    Accurate Cutting-point Estimation for Robotic Lychee Harvesting through Geometry-aware Learning

    Authors: Gengming Zhang, Hao Cao, Kewei Hu, Yaoqiang Pan, Yuqin Deng, Hongjun Wang, Hanwen Kang

    Abstract: Accurately identifying lychee-picking points in unstructured orchard environments and obtaining their coordinate locations is critical to the success of lychee-picking robots. However, traditional two-dimensional (2D) image-based object detection methods often struggle due to the complex geometric structures of branches, leaves and fruits, leading to incorrect determination of lychee picking point… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  42. arXiv:2403.12856  [pdf, other

    cs.LG cs.RO

    Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning

    Authors: Mirco Theile, Hongpeng Cao, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli

    Abstract: In reinforcement learning (RL), exploiting environmental symmetries can significantly enhance efficiency, robustness, and performance. However, ensuring that the deep RL policy and value networks are respectively equivariant and invariant to exploit these symmetries is a substantial challenge. Related works try to design networks that are equivariant and invariant by construction, limiting them to… ▽ More

    Submitted 25 August, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted at IROS 2024. A video can be found here: https://youtu.be/L6NOdvU7n7s. The code is available at https://github.com/theilem/uavSim

  43. arXiv:2403.12373  [pdf, other

    cs.CL

    RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners

    Authors: Chi Hu, Yuan Ge, Xiangnan Ma, Hang Cao, Qiang Li, Yonghua Yang, Tong Xiao, Jingbo Zhu

    Abstract: Large Language Models (LLMs) have achieved impressive performance across various reasoning tasks. However, even state-of-the-art LLMs such as ChatGPT are prone to logical errors during their reasoning processes. Existing solutions, such as deploying task-specific verifiers or voting over multiple reasoning paths, either require extensive human annotations or fail in scenarios with inconsistent res… ▽ More

    Submitted 22 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: LREC-Coling 2024 Long Paper

  44. arXiv:2403.11496  [pdf, other

    cs.RO cs.AI

    MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

    Authors: Thien-Minh Nguyen, Shenghai Yuan, Thien Hoang Nguyen, Pengyu Yin, Haozhi Cao, Lihua Xie, Maciej Wozniak, Patric Jensfelt, Marko Thiel, Justin Ziegenbein, Noel Blunder

    Abstract: Perception plays a crucial role in various robot applications. However, existing well-annotated datasets are biased towards autonomous driving scenarios, while unlabelled SLAM datasets are quickly over-fitted, and often lack environment and domain variations. To expand the frontier of these fields, we introduce a comprehensive dataset named MCD (Multi-Campus Dataset), featuring a wide range of sen… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  45. arXiv:2403.10569  [pdf, other

    cs.LG cs.AI cs.CV

    Achieving Pareto Optimality using Efficient Parameter Reduction for DNNs in Resource-Constrained Edge Environment

    Authors: Atah Nuh Mih, Alireza Rahimi, Asfia Kawnine, Francis Palma, Monica Wachowicz, Rickey Dubay, Hung Cao

    Abstract: This paper proposes an optimization of an existing Deep Neural Network (DNN) that improves its hardware utilization and facilitates on-device training for resource-constrained edge environments. We implement efficient parameter reduction strategies on Xception that shrink the model size without sacrificing accuracy, thus decreasing memory utilization during training. We evaluate our model in two e… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.05355

  46. arXiv:2403.08885  [pdf, other

    cs.CV cs.AI cs.RO

    SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net

    Authors: Helin Cao, Sven Behnke

    Abstract: We introduce SLCF-Net, a novel approach for the Semantic Scene Completion (SSC) task that sequentially fuses LiDAR and camera data. It jointly estimates missing geometry and semantics in a scene from sequences of RGB images and sparse LiDAR measurements. The images are semantically segmented by a pre-trained 2D U-Net and a dense depth prior is estimated from a depth-conditioned pipeline fueled by… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 2024 IEEE International Conference on Robotics and Automation (ICRA2024), Yokohama, Japan, May 2024

  47. arXiv:2403.08192  [pdf, other

    cs.CL q-bio.BM

    MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension

    Authors: Xingyu Lu, He Cao, Zijing Liu, Shengyuan Bai, Leqing Chen, Yuan Yao, Hai-Tao Zheng, Yu Li

    Abstract: Large language models are playing an increasingly significant role in molecular research, yet existing models often generate erroneous information, posing challenges to accurate molecular comprehension. Traditional evaluation metrics for generated content fail to assess a model's accuracy in molecular understanding. To rectify the absence of factual evaluation, we present MoleculeQA, a novel quest… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 19 pages, 8 figures

  48. arXiv:2403.07183  [pdf, other

    cs.CL cs.AI cs.LG cs.SI

    Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

    Authors: Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, James Y. Zou

    Abstract: We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in… ▽ More

    Submitted 15 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 46 pages, 31 figures, ICML '24

    ACM Class: I.2.7

  49. arXiv:2403.06461  [pdf, other

    cs.CV

    Reliable Spatial-Temporal Voxels For Multi-Modal Test-Time Adaptation

    Authors: Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Xingyu Ji, Shenghai Yuan, Lihua Xie

    Abstract: Multi-modal test-time adaptation (MM-TTA) is proposed to adapt models to an unlabeled target domain by leveraging the complementary multi-modal inputs in an online manner. Previous MM-TTA methods for 3D segmentation rely on predictions of cross-modal information in each input frame, while they ignore the fact that predictions of geometric neighborhoods within consecutive frames are highly correlat… ▽ More

    Submitted 25 July, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  50. arXiv:2403.01210  [pdf, other

    cs.CV cs.AI

    SAR-AE-SFP: SAR Imagery Adversarial Example in Real Physics domain with Target Scattering Feature Parameters

    Authors: Jiahao Cui, Jiale Duan, Binyan Luo, Hang Cao, Wang Guo, Haifeng Li

    Abstract: Deep neural network-based Synthetic Aperture Radar (SAR) target recognition models are susceptible to adversarial examples. Current adversarial example generation methods for SAR imagery primarily operate in the 2D digital domain, known as image adversarial examples. Recent work, while considering SAR imaging scatter mechanisms, fails to account for the actual imaging process, rendering attacks in… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 10 pages, 9 figures, 2 tables