Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 187 results for author: Lyu, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.04300  [pdf, other

    eess.IV cs.CV

    An Explainable Non-local Network for COVID-19 Diagnosis

    Authors: Jingfu Yang, Peng Huang, Jing Hu, Shu Hu, Siwei Lyu, Xin Wang, Jun Guo, Xi Wu

    Abstract: The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  2. arXiv:2408.02191  [pdf, other

    cs.CV

    Dense Feature Interaction Network for Image Inpainting Localization

    Authors: Ye Yao, Tingfeng Han, Shan Jia, Siwei Lyu

    Abstract: Image inpainting, which is the task of filling in missing areas in an image, is a common image editing technique. Inpainting can be used to conceal or alter image contents in malicious manipulation of images, driving the need for research in image inpainting detection. Existing methods mostly rely on a basic encoder-decoder structure, which often results in a high number of false positives or miss… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  3. arXiv:2407.21788  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Vision-Language Model Based Handwriting Verification

    Authors: Mihir Chauhan, Abhishek Satbhai, Mohammad Abuzar Hashemi, Mir Basheer Ali, Bina Ramamurthy, Mingchen Gao, Siwei Lyu, Sargur Srihari

    Abstract: Handwriting Verification is a critical in document forensics. Deep learning based approaches often face skepticism from forensic document examiners due to their lack of explainability and reliance on extensive training data and handcrafted features. This paper explores using Vision Language Models (VLMs), such as OpenAI's GPT-4o and Google's PaliGemma, to address these challenges. By leveraging th… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 4 Pages, 1 Figure, 1 Table, Accepted as Short paper at Irish Machine Vision and Image Processing (IMVIP) Conference

  4. arXiv:2407.05108  [pdf, other

    cs.LG stat.ML

    The Role of Depth, Width, and Tree Size in Expressiveness of Deep Forest

    Authors: Shen-Huan Lyu, Jin-Hui Wu, Qin-Cheng Zheng, Baoliu Ye

    Abstract: Random forests are classical ensemble algorithms that construct multiple randomized decision trees and aggregate their predictions using naive averaging. \citet{zhou2019deep} further propose a deep forest algorithm with multi-layer forests, which outperforms random forests in various tasks. The performance of deep forests is related to three hyperparameters in practice: depth, width, and tree size… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Journal ref: In: Proceedings of the 27th European Conference on Artificial Intelligence, 2024

  5. arXiv:2407.03107  [pdf

    cs.HC cs.GR cs.MM

    Design of a UE5-based digital twin platform

    Authors: Shaoqiu Lyu, Muzhi Wang, Sunrui Zhang, Shengzhi Wang

    Abstract: Aiming at the current mainstream 3D scene engine learning and building cost is too high, this thesis proposes a digital twin platform design program based on Unreal Engine 5 (UE5). It aims to provide a universal platform construction design process to effectively reduce the learning cost of large-scale scene construction. Taking an actual project of a unit as an example, the overall cycle work of… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  6. arXiv:2406.16943  [pdf, other

    eess.SP cs.AI cs.HC cs.LG

    EarDA: Towards Accurate and Data-Efficient Earable Activity Sensing

    Authors: Shengzhe Lyu, Yongliang Chen, Di Duan, Renqi Jia, Weitao Xu

    Abstract: In the realm of smart sensing with the Internet of Things, earable devices are empowered with the capability of multi-modality sensing and intelligence of context-aware computing, leading to its wide usage in Human Activity Recognition (HAR). Nonetheless, unlike the movements captured by Inertial Measurement Unit (IMU) sensors placed on the upper or lower body, those motion signals obtained from e… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: accepted by 2024 IEEE Coupling of Sensing & Computing in AIoT Systems (CSCAIoT)

  7. arXiv:2406.10427  [pdf, other

    cs.LG cs.CR

    Adaptive Randomized Smoothing: Certifying Multi-Step Defences against Adversarial Examples

    Authors: Saiyue Lyu, Shadab Shaikh, Frederick Shpilevskiy, Evan Shelhamer, Mathias Lécuyer

    Abstract: We propose Adaptive Randomized Smoothing (ARS) to certify the predictions of our test-time adaptive models against adversarial examples. ARS extends the analysis of randomized smoothing using f-Differential Privacy to certify the adaptive composition of multiple steps. For the first time, our theory covers the sound adaptive composition of general and high-dimensional functions of noisy input. We… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2406.04745  [pdf, other

    cs.LG cs.CV

    Confidence-aware Contrastive Learning for Selective Classification

    Authors: Yu-Chang Wu, Shen-Huan Lyu, Haopu Shang, Xiangyu Wang, Chao Qian

    Abstract: Selective classification enables models to make predictions only when they are sufficiently confident, aiming to enhance safety and reliability, which is important in high-stakes scenarios. Previous methods mainly use deep neural networks and focus on modifying the architecture of classification layers to enable the model to estimate the confidence of its prediction. This work provides a generaliz… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  9. arXiv:2406.01112  [pdf, other

    cs.CV

    BACON: Bayesian Optimal Condensation Framework for Dataset Distillation

    Authors: Zheng Zhou, Hongbo Zhao, Guangliang Cheng, Xiangtai Li, Shuchang Lyu, Wenquan Feng, Qi Zhao

    Abstract: Dataset Distillation (DD) aims to distill knowledge from extensive datasets into more compact ones while preserving performance on the test set, thereby reducing storage costs and training expenses. However, existing methods often suffer from computational intensity, particularly exhibiting suboptimal performance with large dataset sizes due to the lack of a robust theoretical framework for analyz… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 22 pages, 10 figures

  10. arXiv:2406.00985  [pdf, other

    cs.CV

    MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models

    Authors: Mingzhen Huang, Jialing Cai, Shan Jia, Vishnu Suresh Lokhande, Siwei Lyu

    Abstract: Text-driven image synthesis has made significant advancements with the development of diffusion models, transforming how visual content is generated from text prompts. Despite these advances, text-driven image editing, a key area in computer graphics, faces unique challenges. A major challenge is making simultaneous edits across multiple objects or attributes. Applying these methods sequentially f… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  11. arXiv:2405.18320  [pdf, other

    cs.CV cs.AI cs.CL

    Self-Supervised Learning Based Handwriting Verification

    Authors: Mihir Chauhan, Mohammad Abuzar Hashemi, Abhishek Satbhai, Mir Basheer Ali, Bina Ramamurthy, Mingchen Gao, Siwei Lyu, Sargur Srihari

    Abstract: We present SSL-HV: Self-Supervised Learning approaches applied to the task of Handwriting Verification. This task involves determining whether a given pair of handwritten images originate from the same or different writer distribution. We have compared the performance of multiple generative, contrastive SSL approaches against handcrafted feature extractors and supervised learning on CEDAR AND data… ▽ More

    Submitted 1 August, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 8 pages, 2 figures, 2 tables, Accepted at Irish Machine Vision and Image Processing Conference 2024

  12. arXiv:2405.17837  [pdf, other

    cs.HC

    Enabling Generative Design Tools with LLM Agents for Building Novel Devices: A Case Study on Fluidic Computation Interfaces

    Authors: Qiuyu Lu, Jiawei Fang, Zhihao Yao, Yue Yang, Shiqing Lyu, Haipeng Mi, Lining Yao

    Abstract: In the field of Human-Computer Interaction (HCI), the development of interactive devices represents a significant area of focus. The advent of novel hardware and advanced fabrication techniques has underscored the demand for specialized design tools that democratize the prototyping process for such cutting-edge devices. While these tools simplify the process through parametric design and simulatio… ▽ More

    Submitted 22 July, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 25 pages, 12 figures

  13. arXiv:2405.11326  [pdf, other

    cs.LG cs.CV

    On the Trajectory Regularity of ODE-based Diffusion Sampling

    Authors: Defang Chen, Zhenyu Zhou, Can Wang, Chunhua Shen, Siwei Lyu

    Abstract: Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models. We characterize an implicit denoi… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: ICML 2024, 30 pages

  14. arXiv:2405.08487  [pdf, other

    cs.CV cs.CR

    Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method

    Authors: Mian Zou, Baosheng Yu, Yibing Zhan, Siwei Lyu, Kede Ma

    Abstract: In recent years, deep learning has greatly streamlined the process of generating realistic fake face images. Aware of the dangers, researchers have developed various tools to spot these counterfeits. Yet none asked the fundamental question: What digital manipulations make a real photographic face image fake, while others do not? In this paper, we put face forgery in a semantic context and define t… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  15. arXiv:2405.04051  [pdf, ps, other

    cs.IT

    On the quantization goodness of polar lattices

    Authors: Ling Liu, Shanxiang Lyu, Cong Ling, Baoming Bai

    Abstract: In this work, we prove that polar lattices, when tailored for lossy compression, are quantization-good in the sense that their normalized second moments approach $\frac{1}{2πe}$ as the dimension of lattices increases. It has been predicted by Zamir et al. \cite{ZamirQZ96} that the Entropy Coded Dithered Quantization (ECDQ) system using quantization-good lattices can achieve the rate-distortion bou… ▽ More

    Submitted 13 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE for possible publication

  16. arXiv:2405.00135  [pdf, other

    cs.IT eess.SP

    Improving Channel Resilience for Task-Oriented Semantic Communications: A Unified Information Bottleneck Approach

    Authors: Shuai Lyu, Yao Sun, Linke Guo, Xiaoyong Yuan, Fang Fang, Lan Zhang, Xianbin Wang

    Abstract: Task-oriented semantic communications (TSC) enhance radio resource efficiency by transmitting task-relevant semantic information. However, current research often overlooks the inherent semantic distinctions among encoded features. Due to unavoidable channel variations from time and frequency-selective fading, semantically sensitive feature units could be more susceptible to erroneous inference if… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE Communications Letters

  17. arXiv:2404.19171  [pdf, other

    cs.CV cs.AI

    Explicit Correlation Learning for Generalizable Cross-Modal Deepfake Detection

    Authors: Cai Yu, Shan Jia, Xiaomeng Fu, Jin Liu, Jiahe Tian, Jiao Dai, Xi Wang, Siwei Lyu, Jizhong Han

    Abstract: With the rising prevalence of deepfakes, there is a growing interest in developing generalizable detection methods for various types of deepfakes. While effective in their specific modalities, traditional detection methods fall short in addressing the generalizability of detection across diverse cross-modal deepfakes. This paper aims to explicitly learn potential cross-modal correlation to enhance… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: accepted by ICME 2024

  18. arXiv:2404.18033  [pdf, other

    cs.CV

    Exposing Text-Image Inconsistency Using Diffusion Models

    Authors: Mingzhen Huang, Shan Jia, Zhou Zhou, Yan Ju, Jialing Cai, Siwei Lyu

    Abstract: In the battle against widespread online misinformation, a growing problem is text-image inconsistency, where images are misleadingly paired with texts with different intent or meaning. Existing classification-based methods for text-image inconsistency can identify contextual inconsistencies but fail to provide explainable justifications for their decisions that humans can understand. Although more… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  19. arXiv:2404.13146  [pdf, other

    cs.CR cs.CV

    DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection

    Authors: Yan Ju, Chengzhe Sun, Shan Jia, Shuwei Hou, Zhaofeng Si, Soumyya Kanti Datta, Lipeng Ke, Riky Zhou, Anita Nikolich, Siwei Lyu

    Abstract: Deepfakes, as AI-generated media, have increasingly threatened media integrity and personal privacy with realistic yet fake digital content. In this work, we introduce an open-source and user-friendly online platform, DeepFake-O-Meter v2.0, that integrates state-of-the-art methods for detecting Deepfake images, videos, and audio. Built upon DeepFake-O-Meter v1.0, we have made significant upgrades… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  20. arXiv:2403.14077  [pdf, other

    cs.AI cs.CR

    Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

    Authors: Shan Jia, Reilin Lyu, Kangran Zhao, Yize Chen, Zhiyuan Yan, Yan Ju, Chuanbo Hu, Xin Li, Baoyuan Wu, Siwei Lyu

    Abstract: DeepFakes, which refer to AI-generated media content, have become an increasing concern due to their use as a means for disinformation. Detecting DeepFakes is currently solved with programmed machine learning algorithms. In this work, we investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection. We conducted qualitative and quantitative experiments to demonstrat… ▽ More

    Submitted 11 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  21. arXiv:2403.13358  [pdf, other

    cs.RO cs.CV cs.LG

    GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot

    Authors: Wenxuan Song, Han Zhao, Pengxiang Ding, Can Cui, Shangke Lyu, Yaning Fan, Donglin Wang

    Abstract: Multi-task robot learning holds significant importance in tackling diverse and complex scenarios. However, current approaches are hindered by performance issues and difficulties in collecting training datasets. In this paper, we propose GeRM (Generalist Robotic Model). We utilize offline reinforcement learning to optimize data utilization strategies to learn from both demonstrations and sub-optima… ▽ More

    Submitted 9 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  22. arXiv:2403.12631  [pdf

    cs.RO cs.AI

    PointGrasp: Point Cloud-based Grasping for Tendon-driven Soft Robotic Glove Applications

    Authors: Chen Hu, Shirui Lyu, Eojin Rho, Daekyum Kim, Shan Luo, Letizia Gionfrida

    Abstract: Controlling hand exoskeletons to assist individuals with grasping tasks poses a challenge due to the difficulty in understanding user intentions. We propose that most daily grasping tasks during activities of daily living (ADL) can be deduced by analyzing object geometries (simple and complex) from 3D point clouds. The study introduces PointGrasp, a real-time system designed for identifying househ… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 6 pages, 8 figures, conference

    ACM Class: I.2; I.4

  23. arXiv:2403.03101  [pdf, other

    cs.CL cs.AI cs.HC cs.LG cs.MA

    KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents

    Authors: Yuqi Zhu, Shuofei Qiao, Yixin Ou, Shumin Deng, Ningyu Zhang, Shiwei Lyu, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen

    Abstract: Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environments through generating executable actions. This inadequacy primarily stems from the lack of built-in action knowledge in language agents, which fails to effectively guide the planning trajectories durin… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Work in progress. Project page: https://zjunlp.github.io/project/KnowAgent/ Code: https://github.com/zjunlp/KnowAgent

  24. arXiv:2402.01154  [pdf, other

    cs.CR

    Towards Quantum-Safe Federated Learning via Homomorphic Encryption: Learning with Gradients

    Authors: Guangfeng Yan, Shanxiang Lyu, Hanxu Hou, Zhiyong Zheng, Linqi Song

    Abstract: This paper introduces a privacy-preserving distributed learning framework via private-key homomorphic encryption. Thanks to the randomness of the quantization of gradients, our learning with error (LWE) based encryption can eliminate the error terms, thus avoiding the issue of error expansion in conventional LWE-based homomorphic encryption. The proposed system allows a large number of learning pa… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  25. arXiv:2401.10113  [pdf, ps, other

    cs.CV

    Exposing Lip-syncing Deepfakes from Mouth Inconsistencies

    Authors: Soumyya Kanti Datta, Shan Jia, Siwei Lyu

    Abstract: A lip-syncing deepfake is a digitally manipulated video in which a person's lip movements are created convincingly using AI models to match altered or entirely new audio. Lip-syncing deepfakes are a dangerous type of deepfakes as the artifacts are limited to the lip region and more difficult to discern. In this paper, we describe a novel approach, LIP-syncing detection based on mouth INConsistency… ▽ More

    Submitted 3 June, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  26. arXiv:2312.17431  [pdf, other

    cs.CR cs.CV

    MVPatch: More Vivid Patch for Adversarial Camouflaged Attacks on Object Detectors in the Physical World

    Authors: Zheng Zhou, Hongbo Zhao, Ju Liu, Qiaosheng Zhang, Liwei Geng, Shuchang Lyu, Wenquan Feng

    Abstract: Recent studies have shown that Adversarial Patches (APs) can effectively manipulate object detection models. However, the conspicuous patterns often associated with these patches tend to attract human attention, posing a significant challenge. Existing research has primarily focused on enhancing attack efficacy in the physical domain while often neglecting the optimization of stealthiness and tran… ▽ More

    Submitted 19 July, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: 16 pages, 8 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2312.09785  [pdf, other

    cs.CL

    RJUA-QA: A Comprehensive QA Dataset for Urology

    Authors: Shiwei Lyu, Chenfei Chi, Hongbo Cai, Lei Shi, Xiaoyan Yang, Lei Liu, Xiang Chen, Deng Zhao, Zhiqiang Zhang, Xianguo Lyu, Ming Zhang, Fangzhou Li, Xiaowei Ma, Yue Shen, Jinjie Gu, Wei Xue, Yiran Huang

    Abstract: We introduce RJUA-QA, a novel medical dataset for question answering (QA) and reasoning with clinical evidence, contributing to bridge the gap between general large language models (LLMs) and medical-specific LLM applications. RJUA-QA is derived from realistic clinical scenarios and aims to facilitate LLMs in generating reliable diagnostic and advice. The dataset contains 2,132 curated Question-Co… ▽ More

    Submitted 7 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: An initial version

  28. Multiple Instance Learning for Uplift Modeling

    Authors: Yao Zhao, Haipeng Zhang, Shiwei Lyu, Ruiying Jiang, Jinjie Gu, Guannan Zhang

    Abstract: Uplift modeling is widely used in performance marketing to estimate effects of promotion campaigns (e.g., increase of customer retention rate). Since it is impossible to observe outcomes of a recipient in treatment (e.g., receiving a certain promotion) and control (e.g., without promotion) groups simultaneously (i.e., counter-factual), uplift models are mainly trained on instances of treatment and… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: short paper of CIKM22(full version)

    Journal ref: Proceedings of the 31st ACM International Conference on Information and Knowledge Management (2022) 4727-4731

  29. arXiv:2312.05738  [pdf, other

    cs.CR cs.AI

    FedReverse: Multiparty Reversible Deep Neural Network Watermarking

    Authors: Junlong Mao, Huiyi Tang, Yi Zhang, Fengxia Liu, Zhiyong Zheng, Shanxiang Lyu

    Abstract: The proliferation of Deep Neural Networks (DNN) in commercial applications is expanding rapidly. Simultaneously, the increasing complexity and cost of training DNN models have intensified the urgency surrounding the protection of intellectual property associated with these trained models. In this regard, DNN watermarking has emerged as a crucial safeguarding technique. This paper presents FedRever… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: 13 pages

  30. arXiv:2311.11278  [pdf, other

    cs.CV

    Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

    Authors: Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, Baoyuan Wu

    Abstract: Deepfake detection faces a critical generalization hurdle, with performance deteriorating when there is a mismatch between the distributions of training and testing data. A broadly received explanation is the tendency of these detectors to be overfitted to forgery-specific artifacts, rather than learning features that are widely applicable across various forgeries. To address this issue, we propos… ▽ More

    Submitted 28 March, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

  31. arXiv:2311.06015  [pdf

    cs.RO cs.AI

    RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph

    Authors: Hongyin Zhang, Diyuan Shi, Zifeng Zhuang, Han Zhao, Zhenyu Wei, Feng Zhao, Sibo Gai, Shangke Lyu, Donglin Wang

    Abstract: Developing robotic intelligent systems that can adapt quickly to unseen wild situations is one of the critical challenges in pursuing autonomous robotics. Although some impressive progress has been made in walking stability and skill learning in the field of legged robots, their ability to fast adaptation is still inferior to that of animals in nature. Animals are born with massive skills needed t… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  32. arXiv:2311.05836  [pdf, other

    eess.IV cs.CV cs.LG

    UMedNeRF: Uncertainty-aware Single View Volumetric Rendering for Medical Neural Radiance Fields

    Authors: Jing Hu, Qinrui Fan, Shu Hu, Siwei Lyu, Xi Wu, Xin Wang

    Abstract: In the field of clinical medicine, computed tomography (CT) is an effective medical imaging modality for the diagnosis of various pathologies. Compared with X-ray images, CT images can provide more information, including multi-planar slices and three-dimensional structures for clinical diagnosis. However, CT imaging requires patients to be exposed to large doses of ionizing radiation for a long ti… ▽ More

    Submitted 1 March, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

  33. arXiv:2311.02926  [pdf, other

    cs.CV cs.AI

    Deep Image Semantic Communication Model for Artificial Intelligent Internet of Things

    Authors: Li Ping Qian, Yi Zhang, Sikai Lyu, Huijie Zhu, Yuan Wu, Xuemin Sherman Shen, Xiaoniu Yang

    Abstract: With the rapid development of Artificial Intelligent Internet of Things (AIoT), the image data from AIoT devices has been witnessing the explosive increasing. In this paper, a novel deep image semantic communication model is proposed for the efficient image communication in AIoT. Particularly, at the transmitter side, a high-precision image semantic segmentation algorithm is proposed to extract th… ▽ More

    Submitted 8 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

  34. arXiv:2310.14374  [pdf, other

    cs.CV

    OV-VG: A Benchmark for Open-Vocabulary Visual Grounding

    Authors: Chunlei Wang, Wenquan Feng, Xiangtai Li, Guangliang Cheng, Shuchang Lyu, Binghao Liu, Lijiang Chen, Qi Zhao

    Abstract: Open-vocabulary learning has emerged as a cutting-edge research area, particularly in light of the widespread adoption of vision-based foundational models. Its primary objective is to comprehend novel concepts that are not encompassed within a predefined vocabulary. One key facet of this endeavor is Visual Grounding, which entails locating a specific region within an image based on a corresponding… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

  35. arXiv:2310.07525  [pdf, other

    cs.RO

    ViT-A*: Legged Robot Path Planning using Vision Transformer A*

    Authors: Jianwei Liu, Shirui Lyu, Denis Hadjivelichkov, Valerio Modugno, Dimitrios Kanoulas

    Abstract: Legged robots, particularly quadrupeds, offer promising navigation capabilities, especially in scenarios requiring traversal over diverse terrains and obstacle avoidance. This paper addresses the challenge of enabling legged robots to navigate complex environments effectively through the integration of data-driven path-planning methods. We propose an approach that utilizes differentiable planners,… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 6 pages, 6 figures, conference

    Journal ref: IEEE-RAS International Conference on Humanoids Robots (Humanoids) 2023

  36. Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter

    Authors: Shuai Li, Siwei Lyu, Jeff Trinkle

    Abstract: Due to the limitations of the robotic sensors, during a robotic manipulation task, the acquisition of the object's state can be unreliable and noisy. Combining an accurate model of multi-body dynamic system with Bayesian filtering methods has been shown to be able to filter out noise from the object's observed states. However, efficiency of these filtering methods suffers from samples that violate… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  37. arXiv:2310.03827  [pdf, other

    cs.CV

    Integrating Audio-Visual Features for Multimodal Deepfake Detection

    Authors: Sneha Muppalla, Shan Jia, Siwei Lyu

    Abstract: Deepfakes are AI-generated media in which an image or video has been digitally modified. The advancements made in deepfake technology have led to privacy and security issues. Most deepfake detection techniques rely on the detection of a single modality. Existing methods for audio-visual detection do not always surpass that of the analysis based on single modalities. Therefore, this paper proposes… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  38. arXiv:2310.00405  [pdf, other

    cs.CV

    Controlling Neural Style Transfer with Deep Reinforcement Learning

    Authors: Chengming Feng, Jing Hu, Xin Wang, Shu Hu, Bin Zhu, Xi Wu, Hongtu Zhu, Siwei Lyu

    Abstract: Controlling the degree of stylization in the Neural Style Transfer (NST) is a little tricky since it usually needs hand-engineering on hyper-parameters. In this paper, we propose the first deep Reinforcement Learning (RL) based architecture that splits one-step style transfer into a step-wise process for the NST task. Our RL-based method tends to preserve more details and structures of the content… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: Accepted by IJCAI 2023. The contributions of Chengming Feng and Jing Hu to this paper were equal. arXiv admin note: substantial text overlap with arXiv:2309.13672

  39. arXiv:2310.00359  [pdf, other

    cs.CV

    Improving Cross-dataset Deepfake Detection with Deep Information Decomposition

    Authors: Shanmin Yang, Shu Hu, Bin Zhu, Ying Fu, Siwei Lyu, Xi Wu, Xin Wang

    Abstract: Deepfake technology poses a significant threat to security and social trust. Although existing detection methods have demonstrated high performance in identifying forgeries within datasets using the same techniques for training and testing, they suffer from sharp performance degradation when faced with cross-dataset scenarios where unseen deepfake techniques are tested. To address this challenge,… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

  40. arXiv:2309.13672  [pdf, other

    cs.CV cs.AI

    RL-I2IT: Image-to-Image Translation with Deep Reinforcement Learning

    Authors: Xin Wang, Ziwei Luo, Jing Hu, Chengming Feng, Shu Hu, Bin Zhu, Xi Wu, Hongtu Zhu, Xin Li, Siwei Lyu

    Abstract: Most existing Image-to-Image Translation (I2IT) methods generate images in a single run of a deep learning (DL) model. However, designing such a single-step model is always challenging, requiring a huge number of parameters and easily falling into bad global minimums and overfitting. In this work, we reformulate I2IT as a step-wise decision-making problem via deep reinforcement learning (DRL) and… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

  41. arXiv:2309.05145  [pdf, other

    cs.LG cs.AI stat.ML

    Outlier Robust Adversarial Training

    Authors: Shu Hu, Zhenhuan Yang, Xin Wang, Yiming Ying, Siwei Lyu

    Abstract: Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: Accepted by The 15th Asian Conference on Machine Learning (ACML 2023)

  42. REB: Reducing Biases in Representation for Industrial Anomaly Detection

    Authors: Shuai Lyu, Dongmei Mo, Waikeung Wong

    Abstract: Existing representation-based methods usually conduct industrial anomaly detection in two stages: obtain feature representations with a pre-trained model and perform distance measures for anomaly detection. Among them, K-nearest neighbor (KNN) retrieval-based anomaly detection methods show promising results. However, the features are not fully exploited as these methods ignore domain bias of pre-t… ▽ More

    Submitted 17 May, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 14 pages, 7 figures, 7 tables

  43. arXiv:2308.09611  [pdf, other

    cs.CV

    Language-guided Human Motion Synthesis with Atomic Actions

    Authors: Yuanhao Zhai, Mingzhen Huang, Tianyu Luan, Lu Dong, Ifeoma Nwogu, Siwei Lyu, David Doermann, Junsong Yuan

    Abstract: Language-guided human motion synthesis has been a challenging task due to the inherent complexity and diversity of human behaviors. Previous methods face limitations in generalization to novel actions, often resulting in unrealistic or incoherent motion sequences. In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and emplo… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM MM 2023, code: https://github.com/yhZhai/ATOM

  44. arXiv:2308.08896  [pdf, other

    cs.LG

    Optimal Resource Allocation for U-Shaped Parallel Split Learning

    Authors: Song Lyu, Zheng Lin, Guanqiao Qu, Xianhao Chen, Xiaoxia Huang, Pan Li

    Abstract: Split learning (SL) has emerged as a promising approach for model training without revealing the raw data samples from the data owners. However, traditional SL inevitably leaks label privacy as the tail model (with the last layers) should be placed on the server. To overcome this limitation, one promising solution is to utilize U-shaped architecture to leave both early layers and last layers on th… ▽ More

    Submitted 8 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: 6 pages, 6 figures

  45. arXiv:2308.01520  [pdf, other

    cs.CV

    COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face Forgery Detection

    Authors: Cong Zhang, Honggang Qi, Shuhui Wang, Yuezun Li, Siwei Lyu

    Abstract: DeepFakes have raised serious societal concerns, leading to a great surge in detection-based forensics methods in recent years. Face forgery recognition is a standard detection method that usually follows a two-phase pipeline. While those methods perform well in ideal experimental environment, they face challenges when dealing with DeepFakes in the wild involving complex background and multiple fa… ▽ More

    Submitted 24 May, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

  46. arXiv:2308.00964  [pdf, other

    cs.CV

    ForensicsForest Family: A Series of Multi-scale Hierarchical Cascade Forests for Detecting GAN-generated Faces

    Authors: Jiucui Lu, Jiaran Zhou, Junyu Dong, Bin Li, Siwei Lyu, Yuezun Li

    Abstract: The prominent progress in generative models has significantly improved the reality of generated faces, bringing serious concerns to society. Since recent GAN-generated faces are in high realism, the forgery traces have become more imperceptible, increasing the forensics challenge. To combat GAN-generated faces, many countermeasures based on Convolutional Neural Networks (CNNs) have been spawned du… ▽ More

    Submitted 26 April, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: To Appear in IEEE TIFS 2024

  47. arXiv:2307.14593  [pdf, other

    cs.CV

    FakeTracer: Catching Face-swap DeepFakes via Implanting Traces in Training

    Authors: Pu Sun, Honggang Qi, Yuezun Li, Siwei Lyu

    Abstract: Face-swap DeepFake is an emerging AI-based face forgery technique that can replace the original face in a video with a generated face of the target identity while retaining consistent facial attributes such as expression and orientation. Due to the high privacy of faces, the misuse of this technique can raise severe social concerns, drawing tremendous attention to defend against DeepFakes recently… ▽ More

    Submitted 21 April, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

  48. arXiv:2307.14491  [pdf, other

    cs.MM cs.SD eess.AS

    A Unified Framework for Modality-Agnostic Deepfakes Detection

    Authors: Cai Yu, Peng Chen, Jiahe Tian, Jin Liu, Jiao Dai, Xi Wang, Yesheng Chai, Shan Jia, Siwei Lyu, Jizhong Han

    Abstract: As AI-generated content (AIGC) thrives, deepfakes have expanded from single-modality falsification to cross-modal fake content creation, where either audio or visual components can be manipulated. While using two unimodal detectors can detect audio-visual deepfakes, cross-modal forgery clues could be overlooked. Existing multimodal deepfake detection methods typically establish correspondence betw… ▽ More

    Submitted 24 October, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  49. arXiv:2307.12392  [pdf, other

    cs.CV

    Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision

    Authors: Menghao Li, Chunlei Wang, Wenquan Feng, Shuchang Lyu, Guangliang Cheng, Xiangtai Li, Binghao Liu, Qi Zhao

    Abstract: Visual Grounding (VG) aims at localizing target objects from an image based on given expressions and has made significant progress with the development of detection and vision transformer. However, existing VG methods tend to generate false-alarm objects when presented with inaccurate or irrelevant descriptions, which commonly occur in practical applications. Moreover, existing methods fail to cap… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

  50. arXiv:2307.01426  [pdf, other

    cs.CV

    DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection

    Authors: Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, Baoyuan Wu

    Abstract: A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. This issue leads to unfair performance comparisons and potentially misleading results. Specifically, there is a lack of uniformity in data processing pipelines, resulting in inconsistent data inputs for detection models. Additionally, there are noticeab… ▽ More

    Submitted 28 October, 2023; v1 submitted 3 July, 2023; originally announced July 2023.