Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 113 results for author: Hong, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.19491  [pdf, other

    cs.CV

    Multi-modal Crowd Counting via Modal Emulation

    Authors: Chenhao Wang, Xiaopeng Hong, Zhiheng Ma, Yupeng Wei, Yabin Wang, Xiaopeng Fan

    Abstract: Multi-modal crowd counting is a crucial task that uses multi-modal cues to estimate the number of people in crowded scenes. To overcome the gap between different modalities, we propose a modal emulation-based two-pass multi-modal crowd-counting framework that enables efficient modal emulation, alignment, and fusion. The framework consists of two key components: a \emph{multi-modal inference} pass… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: This is the preprint version of the paper to appear in BMVC 2024. Please cite the final published version. Code is available at https://github.com/Mr-Monday/Multi-modal-Crowd-Counting-via-Modal-Emulation

  2. arXiv:2407.19078  [pdf, other

    cs.LG stat.ML

    Practical Marketplace Optimization at Uber Using Causally-Informed Machine Learning

    Authors: Bobby Chen, Siyu Chen, Jason Dowlatabadi, Yu Xuan Hong, Vinayak Iyer, Uday Mantripragada, Rishabh Narang, Apoorv Pandey, Zijun Qin, Abrar Sheikh, Hongtao Sun, Jiaqi Sun, Matthew Walker, Kaichen Wei, Chen Xu, Jingnan Yang, Allen T. Zhang, Guoqing Zhang

    Abstract: Budget allocation of marketplace levers, such as incentives for drivers and promotions for riders, has long been a technical and business challenge at Uber; understanding lever budget changes' impact and estimating cost efficiency to achieve predefined budgets is crucial, with the goal of optimal allocations that maximize business value; we introduce an end-to-end machine learning and optimization… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: To be published in the 2nd Workshop on Causal Inference and Machine Learning in Practice, KDD 2024, August 25 to 29, 2024, Barcelona, Spain, 10 pages

    MSC Class: 62J99

  3. arXiv:2407.11086  [pdf, other

    cs.LG cs.AI physics.chem-ph

    Pre-training with Fractional Denoising to Enhance Molecular Property Prediction

    Authors: Yuyan Ni, Shikun Feng, Xin Hong, Yuancheng Sun, Wei-Ying Ma, Zhi-Ming Ma, Qiwei Ye, Yanyan Lan

    Abstract: Deep learning methods have been considered promising for accelerating molecular screening in drug discovery and material design. Due to the limited availability of labelled data, various self-supervised molecular pre-training methods have been presented. While many existing methods utilize common pre-training tasks in computer vision (CV) and natural language processing (NLP), they often overlook… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  4. arXiv:2407.09367  [pdf, other

    cs.CV

    Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation

    Authors: Zhilin Zhu, Xiaopeng Hong, Zhiheng Ma, Weijun Zhuang, Yaohui Ma, Yong Dai, Yaowei Wang

    Abstract: Continual Test-Time Adaptation (CTTA) involves adapting a pre-trained source model to continually changing unsupervised target domains. In this paper, we systematically analyze the challenges of this task: online environment, unsupervised nature, and the risks of error accumulation and catastrophic forgetting under continual domain shifts. To address these challenges, we reshape the online data bu… ▽ More

    Submitted 18 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: This is the preprint version of our paper and supplemental material to appear in ECCV 2024

  5. arXiv:2407.07518  [pdf, other

    cs.CV

    Multi-modal Crowd Counting via a Broker Modality

    Authors: Haoliang Meng, Xiaopeng Hong, Chenhao Wang, Miao Shang, Wangmeng Zuo

    Abstract: Multi-modal crowd counting involves estimating crowd density from both visual and thermal/depth images. This task is challenging due to the significant gap between these distinct modalities. In this paper, we propose a novel approach by introducing an auxiliary broker modality and on this basis frame the task as a triple-modal learning problem. We devise a fusion-based method to generate this brok… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: This is the preprint version of the paper and supplemental material to appear in ECCV 2024. Please cite the final published version. Code is available at https://github.com/HenryCilence/Broker-Modality-Crowd-Counting

  6. arXiv:2407.01310  [pdf, other

    cs.LG cs.CV

    Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces

    Authors: Perusha Moodley, Pramod Kaushik, Dhillu Thambi, Mark Trovinger, Praveen Paruchuri, Xia Hong, Benjamin Rosman

    Abstract: Decision Transformers, in their vanilla form, struggle to perform on image-based environments with multi-discrete action spaces. Although enhanced Decision Transformer architectures have been developed to improve performance, these methods have not specifically addressed this problem of multi-discrete action spaces which hampers existing Decision Transformer architectures from learning good repres… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2406.18159  [pdf, other

    cs.CV cs.GR

    Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models

    Authors: Xiaolin Hong, Hongwei Yi, Fazhi He, Qiong Cao

    Abstract: Generating 3D scenes from human motion sequences supports numerous applications, including virtual reality and architectural design. However, previous auto-regression-based human-aware 3D scene generation methods have struggled to accurately capture the joint distribution of multiple objects and input humans, often resulting in overlapping object generation in the same space. To address this limit… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  8. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires… ▽ More

    Submitted 26 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

  9. arXiv:2406.07487  [pdf, other

    cs.CV

    GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

    Authors: Hang Yao, Ming Liu, Haolin Wang, Zhicun Yin, Zifei Yan, Xiaopeng Hong, Wangmeng Zuo

    Abstract: Diffusion models have shown superior performance on unsupervised anomaly detection tasks. Since trained with normal data only, diffusion models tend to reconstruct normal counterparts of test images with certain noises added. However, these methods treat all potential anomalies equally, which may cause two main problems. From the global perspective, the difficulty of reconstructing images with dif… ▽ More

    Submitted 2 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ECCV 2024, code and models: https://github.com/hyao1/GLAD. Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

  10. arXiv:2406.00334  [pdf, other

    cs.CV

    Image Captioning via Dynamic Path Customization

    Authors: Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Xiaopeng Hong, Yongjian Wu, Rongrong Ji

    Abstract: This paper explores a novel dynamic network for vision and language tasks, where the inferring structure is customized on the fly for different inputs. Most previous state-of-the-art approaches are static and hand-crafted networks, which not only heavily rely on expert knowledge, but also ignore the semantic diversity of input samples, therefore resulting in suboptimal performance. To address thes… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: TNNLS24

  11. arXiv:2405.17802  [pdf, other

    cs.LG cs.AI q-bio.BM

    Multi-level Interaction Modeling for Protein Mutational Effect Prediction

    Authors: Yuanle Mo, Xin Hong, Bowen Gao, Yinjun Jia, Yanyan Lan

    Abstract: Protein-protein interactions are central mediators in many biological processes. Accurately predicting the effects of mutations on interactions is crucial for guiding the modulation of these interactions, thereby playing a significant role in therapeutic development and drug discovery. Mutations generally affect interactions hierarchically across three levels: mutated residues exhibit different si… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  12. arXiv:2404.18060  [pdf, other

    cs.CV cs.LG

    Prompt Customization for Continual Learning

    Authors: Yong Dai, Xiaopeng Hong, Yabin Wang, Zhiheng Ma, Dongmei Jiang, Yaowei Wang

    Abstract: Contemporary continual learning approaches typically select prompts from a pool, which function as supplementary inputs to a pre-trained model. However, this strategy is hindered by the inherent noise of its selection approach when handling increasing tasks. In response to these challenges, we reformulate the prompting approach for continual learning and propose the prompt customization (PC) metho… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: ACM MM

  13. arXiv:2404.01174  [pdf, other

    cs.CV cs.MM

    SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding

    Authors: Wenrui Li, Xiaopeng Hong, Ruiqin Xiong, Xiaopeng Fan

    Abstract: Temporal video grounding (TVG) is a critical task in video content understanding, requiring precise alignment between video content and natural language instructions. Despite significant advancements, existing methods face challenges in managing confidence bias towards salient objects and capturing long-term dependencies in video sequences. To address these issues, we introduce SpikeMba: a multi-m… ▽ More

    Submitted 23 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  14. arXiv:2404.00989  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    360+x: A Panoptic Multi-modal Scene Understanding Dataset

    Authors: Hao Chen, Yuqi Hou, Chenyuan Qu, Irene Testini, Xiaohan Hong, Jianbo Jiao

    Abstract: Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentri… ▽ More

    Submitted 7 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 (Oral Presentation), Project page: https://x360dataset.github.io/

    Journal ref: The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024

  15. arXiv:2403.20009  [pdf, other

    cs.CL cs.LG

    On Large Language Models' Hallucination with Regard to Known Facts

    Authors: Che Jiang, Biqing Qi, Xiangyu Hong, Dayuan Fu, Yang Cheng, Fandong Meng, Mo Yu, Bowen Zhou, Jie Zhou

    Abstract: Large language models are successful in answering factoid questions but are also prone to hallucination.We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics, an area not previously covered in studies on hallucinations.We are able to conduct this analysis via two key ideas.First, we identify the factual question… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by NAACL 2024 MainConference

  16. arXiv:2403.12965  [pdf, other

    cs.CV

    Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment

    Authors: Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan, Shuai Xiao

    Abstract: This paper introduces a novel framework for virtual try-on, termed Wear-Any-Way. Different from previous methods, Wear-Any-Way is a customizable solution. Besides generating high-fidelity results, our method supports users to precisely manipulate the wearing style. To achieve this goal, we first construct a strong pipeline for standard virtual try-on, supporting single/multiple garment try-on and… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project Page: https://mengtingchen.github.io/wear-any-way-page/

  17. arXiv:2402.15297  [pdf, other

    cs.CV cs.LG

    Semi-supervised Counting via Pixel-by-pixel Density Distribution Modelling

    Authors: Hui Lin, Zhiheng Ma, Rongrong Ji, Yaowei Wang, Zhou Su, Xiaopeng Hong, Deyu Meng

    Abstract: This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled. We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value. On this basis, we propose a semi-supervised crowd-counting model. Firstly, we design a pixel-wise distribution matching loss to measure the differences in the p… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: This is the technical report of a paper that was submitted to IEEE Transactions and is now under review

  18. arXiv:2401.12164  [pdf, other

    cs.CV cs.AI

    Semi-supervised segmentation of land cover images using nonlinear canonical correlation analysis with multiple features and t-SNE

    Authors: Hong Wei, James Xiao, Yichao Zhang, Xia Hong

    Abstract: Image segmentation is a clustering task whereby each pixel is assigned a cluster label. Remote sensing data usually consists of multiple bands of spectral images in which there exist semantically meaningful land cover subregions, co-registered with other source data such as LIDAR (LIght Detection And Ranging) data, where available. This suggests that, in order to account for spatial correlation be… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  19. arXiv:2401.03870  [pdf, other

    cs.CV

    Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

    Authors: Hui Lin, Zhiheng Ma, Xiaopeng Hong, Qinnan Shangguan, Deyu Meng

    Abstract: Transformer has been popular in recent crowd counting work since it breaks the limited receptive field of traditional CNNs. However, since crowd images always contain a large number of similar patches, the self-attention mechanism in Transformer tends to find a homogenized solution where the attention maps of almost all patches are identical. In this paper, we address this problem by proposing Gra… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: This is the accepted version of the paper and supplemental material to appear in AAAI 2024. Please cite the final published version. Code is available at {https://github.com/LoraLinH/Gramformer}

  20. arXiv:2401.02335  [pdf, other

    cs.CV

    Linguistic Profiling of Deepfakes: An Open Database for Next-Generation Deepfake Detection

    Authors: Yabin Wang, Zhiwu Huang, Zhiheng Ma, Xiaopeng Hong

    Abstract: The emergence of text-to-image generative models has revolutionized the field of deepfakes, enabling the creation of realistic and convincing visual content directly from textual descriptions. However, this advancement presents considerably greater challenges in detecting the authenticity of such content. Existing deepfake detection datasets and methods often fall short in effectively capturing th… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  21. arXiv:2312.14792  [pdf, ps, other

    cs.LG cs.AI cs.CV cs.IT math.PR

    The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs

    Authors: Junli Fang, João F. C. Mota, Baoshan Lu, Weicheng Zhang, Xuemin Hong

    Abstract: The joint source-channel coding (JSCC) framework leverages deep learning to learn from data the best codes for source and channel coding. When the output signal, rather than being binary, is directly mapped onto the IQ domain (complex-valued), we call the resulting framework joint source coding and modulation (JSCM). We consider a JSCM scenario and show the existence of a strict tradeoff between c… ▽ More

    Submitted 6 June, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Paper accepted in IEEE Transactions on Signal Processing

  22. arXiv:2312.07867  [pdf, other

    cs.AI cs.CL

    BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering

    Authors: Xiaojie Hong, Zixin Song, Liangzhi Li, Xiaoli Wang, Feiyan Liu

    Abstract: Medical Visual Question Answering (Med-VQA) is a very important task in healthcare industry, which answers a natural language question with a medical image. Existing VQA techniques in information systems can be directly applied to solving the task. However, they often suffer from (i) the data insufficient problem, which makes it difficult to train the state of the arts (SOTAs) for the domain-speci… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  23. arXiv:2311.07311  [pdf, other

    cs.CL cs.AI

    Do large language models and humans have similar behaviors in causal inference with script knowledge?

    Authors: Xudong Hong, Margarita Ryzhova, Daniel Adrian Biondi, Vera Demberg

    Abstract: Recently, large pre-trained language models (LLMs) have demonstrated superior language understanding abilities, including zero-shot causal reasoning. However, it is unclear to what extent their capabilities are similar to human ones. We here study the processing of an event $B$ in a script-based story, which causally depends on a previous event $A$. In our manipulation, event $A$ is stated, negate… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 15 pages, 3 figures

    ACM Class: I.2.7; I.2.0

  24. arXiv:2310.10352  [pdf, other

    cs.CV

    Semi-Supervised Crowd Counting with Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes

    Authors: Yifei Qian, Xiaopeng Hong, Zhongliang Guo, Ognjen Arandjelović, Carl R. Donovan

    Abstract: To alleviate the heavy annotation burden for training a reliable crowd counting model and thus make the model more practicable and accurate by being able to benefit from more data, this paper presents a new semi-supervised method based on the mean teacher framework. When there is a scarcity of labeled data available, the model is prone to overfit local patches. Within such contexts, the convention… ▽ More

    Submitted 20 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted by TCSVT

  25. arXiv:2310.04900  [pdf, other

    cs.CV

    HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

    Authors: Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne

    Abstract: Instructional videos are an excellent source for learning multimodal representations by leveraging video-subtitle pairs extracted with automatic speech recognition systems (ASR) from the audio signal in the videos. However, in contrast to human-annotated captions, both speech and subtitles naturally differ from the visual content of the videos and thus provide only noisy supervision for multimodal… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: https://github.com/ninatu/howtocaption

  26. arXiv:2310.04237  [pdf

    cs.CL

    Written and spoken corpus of real and fake social media postings about COVID-19

    Authors: Ng Bee Chin, Ng Zhi Ee Nicole, Kyla Kwan, Lee Yong Han Dylann, Liu Fang, Xu Hong

    Abstract: This study investigates the linguistic traits of fake news and real news. There are two parts to this study: text data and speech data. The text data for this study consisted of 6420 COVID-19 related tweets re-filtered from Patwa et al. (2021). After cleaning, the dataset contained 3049 tweets, with 2161 labeled as 'real' and 888 as 'fake'. The speech data for this study was collected from TikTok,… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: 9 pages, 3 tables

  27. arXiv:2309.00781  [pdf, other

    cs.LG stat.ML

    Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction

    Authors: Alejandro Rodriguez Dominguez, Muhammad Shahzad, Xia Hong

    Abstract: Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. It can be tackled with multiple hypotheses frameworks but with the difficulty of combining them efficiently in a learning model. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. The predictors are reg… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: 63 Pages, 40 Figures

    MSC Class: 28-08; 28-11; 26B25; 26C15; 46A03; 46T12; 49Q05; 51-08; 60D05; 62J02; 62H10; 62-08; 68W25; 68T07; 68T20 ACM Class: I.2.1; I.2.6; I.5.1; I.6.4; I.6.5

  28. arXiv:2308.11066  [pdf, other

    cs.AI eess.SY

    CSM-H-R: A Context Modeling Framework in Supporting Reasoning Automation for Interoperable Intelligent Systems and Privacy Protection

    Authors: Songhui Yue, Xiaoyan Hong, Randy K. Smith

    Abstract: The automation of High-Level Context (HLC) reasoning across intelligent systems at scale is imperative because of the unceasing accumulation of contextual data, the trend of the fusion of data from multiple sources (e.g., sensors, intelligent systems), and the intrinsic complexity and dynamism of context-based decision-making processes. To mitigate the challenges posed by these issues, we propose… ▽ More

    Submitted 5 April, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: 13 pages, 10 figures, Keywords: Automation, Context Dynamism, Context Modeling, Context Reasoning, Intelligent System, Interoperability, Privacy Protection, System Integration

  29. arXiv:2305.01928  [pdf, other

    cs.CV

    Visual Transformation Telling

    Authors: Wanqing Cui, Xin Hong, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng

    Abstract: Humans can naturally reason from superficial state differences (e.g. ground wetness) to transformations descriptions (e.g. raining) according to their life experience. In this paper, we propose a new visual reasoning task to test this transformation reasoning ability in real-world scenarios, called \textbf{V}isual \textbf{T}ransformation \textbf{T}elling (VTT). Given a series of states (i.e. image… ▽ More

    Submitted 11 June, 2024; v1 submitted 3 May, 2023; originally announced May 2023.

  30. Visual Reasoning: from State to Transformation

    Authors: Xin Hong, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng

    Abstract: Most existing visual reasoning tasks, such as CLEVR in VQA, ignore an important factor, i.e.~transformation. They are solely defined to test how well machines understand concepts and relations within static settings, like one image. Such \textbf{state driven} visual reasoning has limitations in reflecting the ability to infer the dynamics between different states, which has shown to be equally imp… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: Accepted by TPAMI. arXiv admin note: substantial text overlap with arXiv:2011.13160

  31. arXiv:2304.10817  [pdf, other

    cs.CV cs.AI

    Can SAM Count Anything? An Empirical Study on SAM Counting

    Authors: Zhiheng Ma, Xiaopeng Hong, Qinnan Shangguan

    Abstract: Meta AI recently released the Segment Anything model (SAM), which has garnered attention due to its impressive performance in class-agnostic segmenting. In this study, we explore the use of SAM for the challenging task of few-shot object counting, which involves counting objects of an unseen category by providing a few bounding boxes of examples. We compare SAM's performance with other few-shot co… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

    Comments: An empirical study on few-shot counting using Meta AI's segment anything model

  32. arXiv:2303.13898  [pdf, other

    cs.CV cs.LG

    Remind of the Past: Incremental Learning with Analogical Prompts

    Authors: Zhiheng Ma, Xiaopeng Hong, Beinan Liu, Yabin Wang, Pinyue Guo, Huiyun Li

    Abstract: Although data-free incremental learning methods are memory-friendly, accurately estimating and counteracting representation shifts is challenging in the absence of historical data. This paper addresses this thorny problem by proposing a novel incremental learning method inspired by human analogy capabilities. Specifically, we design an analogy-making mechanism to remap the new data into the old cl… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  33. arXiv:2303.06531  [pdf, other

    cs.RO cs.AI

    Towards Practical Multi-Robot Hybrid Tasks Allocation for Autonomous Cleaning

    Authors: Yabin Wang, Xiaopeng Hong, Zhiheng Ma, Tiedong Ma, Baoxing Qin, Zhou Su

    Abstract: Task allocation plays a vital role in multi-robot autonomous cleaning systems, where multiple robots work together to clean a large area. However, most current studies mainly focus on deterministic, single-task allocation for cleaning robots, without considering hybrid tasks in uncertain working environments. Moreover, there is a lack of datasets and benchmarks for relevant research. In this paper… ▽ More

    Submitted 4 April, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

  34. arXiv:2302.14475  [pdf, other

    cs.CV cs.LG

    Benchmarking Deepart Detection

    Authors: Yabin Wang, Zhiwu Huang, Xiaopeng Hong

    Abstract: Deepfake technologies have been blurring the boundaries between the real and unreal, likely resulting in malicious events. By leveraging newly emerged deepfake technologies, deepfake researchers have been making a great upending to create deepfake artworks (deeparts), which are further closing the gap between reality and fantasy. To address potentially appeared ethics questions, this paper establi… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

  35. arXiv:2301.12934  [pdf, other

    cs.RO

    Coarse-to-fine Hybrid 3D Mapping System with Co-calibrated Omnidirectional Camera and Non-repetitive LiDAR

    Authors: Ziliang Miao, Buwei He, Wenya Xie, Wenquan Zhao, Xiao Huang, Jian Bai, Xiaoping Hong

    Abstract: This paper presents a novel 3D mapping robot with an omnidirectional field-of-view (FoV) sensor suite composed of a non-repetitive LiDAR and an omnidirectional camera. Thanks to the non-repetitive scanning nature of the LiDAR, an automatic targetless co-calibration method is proposed to simultaneously calibrate the intrinsic parameters for the omnidirectional camera and the extrinsic parameters fo… ▽ More

    Submitted 8 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: Accepted by IEEE Robotics and Automation Letters (RA-L)

  36. arXiv:2301.08571  [pdf, other

    cs.CL cs.CV cs.LG

    Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences

    Authors: Xudong Hong, Asad Sayeed, Khushboo Mehra, Vera Demberg, Bernt Schiele

    Abstract: Current work on image-based story generation suffers from the fact that the existing image sequence collections do not have coherent plots behind them. We improve visual story generation by producing a new image-grounded dataset, Visual Writing Prompts (VWP). VWP contains almost 2K selected sequences of movie shots, each including 5-10 images. The image sequences are aligned with a total of 12K st… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

    Comments: Paper accepted by Transactions of the Association for Computational Linguistics (TACL). This is a pre-MIT Press publication version. 15 pages, 6 figures

  37. arXiv:2301.05206  [pdf, other

    cs.RO cs.CV

    ImMesh: An Immediate LiDAR Localization and Meshing Framework

    Authors: Jiarong Lin, Chongjiang Yuan, Yixi Cai, Haotian Li, Yunfan Ren, Yuying Zou, Xiaoping Hong, Fu Zhang

    Abstract: In this paper, we propose a novel LiDAR(-inertial) odometry and mapping framework to achieve the goal of simultaneous localization and meshing in real-time. This proposed framework termed ImMesh comprises four tightly-coupled modules: receiver, localization, meshing, and broadcaster. The localization module utilizes the prepossessed sensor data from the receiver, estimates the sensor pose online b… ▽ More

    Submitted 11 November, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

  38. arXiv:2211.16192  [pdf, other

    cs.CV cs.CR

    Be Careful with Rotation: A Uniform Backdoor Pattern for 3D Shape

    Authors: Linkun Fan, Fazhi He, Qing Guo, Wei Tang, Xiaolin Hong, Bing Li

    Abstract: For saving cost, many deep neural networks (DNNs) are trained on third-party datasets downloaded from internet, which enables attacker to implant backdoor into DNNs. In 2D domain, inherent structures of different image formats are similar. Hence, backdoor attack designed for one image format will suite for others. However, when it comes to 3D world, there is a huge disparity among different 3D dat… ▽ More

    Submitted 1 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

  39. arXiv:2211.15969  [pdf, other

    cs.CV

    Isolation and Impartial Aggregation: A Paradigm of Incremental Learning without Interference

    Authors: Yabin Wang, Zhiheng Ma, Zhiwu Huang, Yaowei Wang, Zhou Su, Xiaopeng Hong

    Abstract: This paper focuses on the prevalent performance imbalance in the stages of incremental learning. To avoid obvious stage learning bottlenecks, we propose a brand-new stage-isolation based incremental learning framework, which leverages a series of stage-isolated classifiers to perform the learning task of each stage without the interference of others. To be concrete, to aggregate multiple stage cla… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: This is the accepted version of the Paper & Supp to appear in AAAI 2023. Please cite the final published version. Code is available at https://github.com/iamwangyabin/ESN

  40. arXiv:2211.07546  [pdf, other

    cs.CV cs.DB

    Marine Microalgae Detection in Microscopy Images: A New Dataset

    Authors: Shizheng Zhou, Juntao Jiang, Xiaohan Hong, Yajun Fang, Yan Hong, Pengcheng Fu

    Abstract: Marine microalgae are widespread in the ocean and play a crucial role in the ecosystem. Automatic identification and location of marine microalgae in microscopy images would help establish marine ecological environment monitoring and water quality evaluation system. A new dataset for marine microalgae detection is proposed in this paper. Six classes of microalgae commonlyfound in the ocean (Bacill… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  41. arXiv:2209.12435  [pdf, other

    cs.CV cs.RO

    STD: Stable Triangle Descriptor for 3D place recognition

    Authors: Chongjian Yuan, Jiarong Lin, Zuhao Zou, Xiaoping Hong, Fu Zhang

    Abstract: In this work, we present a novel global descriptor termed stable triangle descriptor (STD) for 3D place recognition. For a triangle, its shape is uniquely determined by the length of the sides or included angles. Moreover, the shape of triangles is completely invariant to rigid transformations. Based on this property, we first design an algorithm to efficiently extract local key points from the 3D… ▽ More

    Submitted 22 February, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: 2023 ICRA

  42. arXiv:2209.02955  [pdf, other

    cs.CV

    Semi-supervised Crowd Counting via Density Agency

    Authors: Hui Lin, Zhiheng Ma, Xiaopeng Hong, Yaowei Wang, Zhou Su

    Abstract: In this paper, we propose a new agency-guided semi-supervised counting approach. First, we build a learnable auxiliary structure, namely the density agency to bring the recognized foreground regional features close to corresponding density sub-classes (agents) and push away background ones. Second, we propose a density-guided contrastive learning loss to consolidate the backbone feature extractor.… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: This is the accepted version of the Paper & Supp to appear in ACM MM 2022. Please cite the final published version. Code is available at https://github.com/LoraLinH/Semi-supervised-Crowd-Counting-via-Density-Agency

  43. DnSwin: Toward Real-World Denoising via Continuous Wavelet Sliding-Transformer

    Authors: Hao Li, Zhijing Yang, Xiaobin Hong, Ziying Zhao, Junyang Chen, Yukai Shi, Jinshan Pan

    Abstract: Real-world image denoising is a practical image restoration problem that aims to obtain clean images from in-the-wild noisy inputs. Recently, the Vision Transformer (ViT) has exhibited a strong ability to capture long-range dependencies, and many researchers have attempted to apply the ViT to image denoising tasks. However, a real-world image is an isolated frame that makes the ViT build long-rang… ▽ More

    Submitted 13 September, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

    Comments: Accepted by KBS; Wavelet downsampling expands window size in Transformer cheaply for a better real-world denosing

  44. arXiv:2207.12819  [pdf, other

    cs.CV cs.LG

    S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning

    Authors: Yabin Wang, Zhiwu Huang, Xiaopeng Hong

    Abstract: State-of-the-art deep neural networks are still struggling to address the catastrophic forgetting problem in continual learning. In this paper, we propose one simple paradigm (named as S-Prompting) and two concrete approaches to highly reduce the forgetting degree in one of the most typical continual learning scenarios, i.e., domain increment learning (DIL). The key idea of the paradigm is to lear… ▽ More

    Submitted 18 March, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

    Comments: Accepted to NeurIPS 2022

  45. arXiv:2206.07298  [pdf, other

    cs.CV cs.AI

    S$^2$-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation

    Authors: Mohammed A. M. Elhassan, Chenhui Yang, Chenxi Huang, Tewodros Legesse Munea, Xin Hong, Abuzar B. M. Adam, Amina Benabid

    Abstract: Modern high-performance semantic segmentation methods employ a heavy backbone and dilated convolution to extract the relevant feature. Although extracting features with both contextual and semantic information is critical for the segmentation tasks, it brings a memory footprint and high computation cost for real-time applications. This paper presents a new model to achieve a trade-off between accu… ▽ More

    Submitted 18 May, 2024; v1 submitted 15 June, 2022; originally announced June 2022.

  46. arXiv:2205.08824  [pdf, other

    cs.NI cs.LG

    Automating In-Network Machine Learning

    Authors: Changgang Zheng, Mingyuan Zang, Xinpeng Hong, Riyad Bensoussane, Shay Vargaftik, Yaniv Ben-Itzhak, Noa Zilberman

    Abstract: Using programmable network devices to aid in-network machine learning has been the focus of significant research. However, most of the research was of a limited scope, providing a proof of concept or describing a closed-source algorithm. To date, no general solution has been provided for mapping machine learning algorithms to programmable network devices. In this paper, we present Planter, an open… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

    Comments: (13 pages body, 19 pages total, 18 figures)

  47. arXiv:2205.05467  [pdf, other

    cs.CV cs.LG

    A Continual Deepfake Detection Benchmark: Dataset, Methods, and Essentials

    Authors: Chuqiao Li, Zhiwu Huang, Danda Pani Paudel, Yabin Wang, Mohamad Shahbazi, Xiaopeng Hong, Luc Van Gool

    Abstract: There have been emerging a number of benchmarks and techniques for the detection of deepfakes. However, very few works study the detection of incrementally appearing deepfakes in the real-world scenarios. To simulate the wild scenes, this paper suggests a continual deepfake detection benchmark (CDDB) over a new collection of deepfakes from both known and unknown generative models. The suggested CD… ▽ More

    Submitted 14 November, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: Accepted to WACV 2023

  48. arXiv:2203.05984  [pdf, other

    cs.CV

    Deep Class Incremental Learning from Decentralized Data

    Authors: Xiaohan Zhang, Songlin Dong, Jinjie Chen, Qi Tian, Yihong Gong, Xiaopeng Hong

    Abstract: In this paper, we focus on a new and challenging decentralized machine learning paradigm in which there are continuous inflows of data to be addressed and the data are stored in multiple repositories. We initiate the study of data decentralized class-incremental learning (DCIL) by making the following contributions. Firstly, we formulate the DCIL problem and develop the experimental protocol. Seco… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: Submitted to IEEE Transactions on Neural Networks and Learning Systems. Revised version

  49. arXiv:2203.02636  [pdf, other

    cs.CV

    Boosting Crowd Counting via Multifaceted Attention

    Authors: Hui Lin, Zhiheng Ma, Rongrong Ji, Yaowei Wang, Xiaopeng Hong

    Abstract: This paper focuses on the challenging crowd counting task. As large-scale variations often exist within crowd images, neither fixed-size convolution kernel of CNN nor fixed-size attention of recent vision transformers can well handle this kind of variation. To address this problem, we propose a Multifaceted Attention Network (MAN) to improve transformer models in local spatial relation encoding. M… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted by IEEE CVPR 2022. Codes available at: https://github.com/LoraLinH/Boosting-Crowd-Counting-via-Multifaceted-Attention

  50. arXiv:2202.11295  [pdf, other

    cs.LG eess.SP

    Continual learning-based probabilistic slow feature analysis for multimode dynamic process monitoring

    Authors: Jingxin Zhang, Donghua Zhou, Maoyin Chen, Xia Hong

    Abstract: In this paper, a novel multimode dynamic process monitoring approach is proposed by extending elastic weight consolidation (EWC) to probabilistic slow feature analysis (PSFA) in order to extract multimode slow features for online monitoring. EWC was originally introduced in the setting of machine learning of sequential multi-tasks with the aim of avoiding catastrophic forgetting issue, which equal… ▽ More

    Submitted 28 April, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: This paper has been submitted to IEEE Transactions on Automation Science and Engineering for potential publication