Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 88 results for author: Cao, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01579  [pdf, other

    cs.CV

    Technical Report for CVPR 2024 WeatherProof Dataset Challenge: Semantic Segmentation on Paired Real Data

    Authors: Guojin Cao, Jiaxu Li, Jia He, Ying Min, Yunhao Zhang

    Abstract: This technical report presents the implementation details of 2nd winning for CVPR'24 UG2 WeatherProof Dataset Challenge. This challenge aims at semantic segmentation of images degraded by various degrees of weather from all around the world. We addressed this problem by introducing a pre-trained large-scale vision foundation model: InternImage, and trained it using images with different levels of… ▽ More

    Submitted 9 June, 2024; originally announced July 2024.

  2. arXiv:2406.17628  [pdf, other

    cs.CV cs.CR

    Video Inpainting Localization with Contrastive Learning

    Authors: Zijie Lou, Gang Cao, Man Lin

    Abstract: Deep video inpainting is typically used as malicious manipulation to remove important objects for creating fake videos. It is significant to identify the inpainted regions blindly. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). Specifically, a 3D Uniformer encoder is applied to the video noise residual for learning… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.13576

  3. arXiv:2406.16907  [pdf, other

    eess.SP cs.LG

    RayProNet: A Neural Point Field Framework for Radio Propagation Modeling in 3D Environments

    Authors: Ge Cao, Zhen Peng

    Abstract: The radio wave propagation channel is central to the performance of wireless communication systems. In this paper, we introduce a novel machine learning-empowered methodology for wireless channel modeling. The key ingredients include a point-cloud-based neural network and a Spherical Harmonics encoder with light probes. Our approach offers several significant advantages, including the flexibility… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  4. arXiv:2406.13576  [pdf, other

    cs.CV cs.CR

    Trusted Video Inpainting Localization via Deep Attentive Noise Learning

    Authors: Zijie Lou, Gang Cao, Man Lin

    Abstract: Digital video inpainting techniques have been substantially improved with deep learning in recent years. Although inpainting is originally designed to repair damaged areas, it can also be used as malicious manipulation to remove important objects for creating false scenes and facts. As such it is significant to identify inpainted regions blindly. In this paper, we present a Trusted Video Inpaintin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2406.13565  [pdf, other

    cs.CV cs.CR

    Exploring Multi-view Pixel Contrast for General and Robust Image Forgery Localization

    Authors: Zijie Lou, Gang Cao, Kun Guo, Haochen Zhu, Lifang Yu

    Abstract: Image forgery localization, which aims to segment tampered regions in an image, is a fundamental yet challenging digital forensic task. While some deep learning-based forensic methods have achieved impressive results, they directly learn pixel-to-label mappings without fully exploiting the relationship between pixels in the feature space. To address such deficiency, we propose a Multi-view Pixel-w… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.12651  [pdf, other

    cs.RO cs.AI cs.CL cs.HC

    Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics

    Authors: Huan Xu, Jinlin Wu, Guanglin Cao, Zhen Chen, Zhen Lei, Hongbin Liu

    Abstract: Ultrasonography has revolutionized non-invasive diagnostic methodologies, significantly enhancing patient outcomes across various medical domains. Despite its advancements, integrating ultrasound technology with robotic systems for automated scans presents challenges, including limited command understanding and dynamic execution capabilities. To address these challenges, this paper introduces a no… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: This work has been accepted by MICCAI 2024

  7. arXiv:2405.00461  [pdf, other

    cs.RO cs.AI cs.CL cs.HC

    Enhancing Surgical Robots with Embodied Intelligence for Autonomous Ultrasound Scanning

    Authors: Huan Xu, Jinlin Wu, Guanglin Cao, Zhen Lei, Zhen Chen, Hongbin Liu

    Abstract: Ultrasound robots are increasingly used in medical diagnostics and early disease screening. However, current ultrasound robots lack the intelligence to understand human intentions and instructions, hindering autonomous ultrasound scanning. To solve this problem, we propose a novel Ultrasound Embodied Intelligence system that equips ultrasound robots with the large language model (LLM) and domain k… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 3 pages, 1 figure, 2 tables

  8. arXiv:2403.20254  [pdf, other

    cs.CV

    Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions

    Authors: Runhao Zeng, Xiaoyong Chen, Jiaming Liang, Huisi Wu, Guangzhong Cao, Yong Guo

    Abstract: Temporal action detection (TAD) aims to locate action positions and recognize action categories in long-term untrimmed videos. Although many methods have achieved promising results, their robustness has not been thoroughly studied. In practice, we observe that temporal information in videos can be occasionally corrupted, such as missing or blurred frames. Interestingly, existing methods often incu… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  9. arXiv:2403.16638  [pdf, other

    cs.CV cs.CR

    AI-Generated Video Detection via Spatio-Temporal Anomaly Learning

    Authors: Jianfa Bai, Man Lin, Gang Cao

    Abstract: The advancement of generation models has led to the emergence of highly realistic artificial intelligence (AI)-generated videos. Malicious users can easily create non-existent videos to spread false information. This letter proposes an effective AI-generated video detection (AIGVDet) scheme by capturing the forensic traces with a two-branch spatio-temporal convolutional neural network (CNN). Speci… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  10. arXiv:2403.09143  [pdf, other

    cs.GR

    A New Split Algorithm for 3D Gaussian Splatting

    Authors: Qiyuan Feng, Gengchen Cao, Haoxiang Chen, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu

    Abstract: 3D Gaussian splatting models, as a novel explicit 3D representation, have been applied in many domains recently, such as explicit geometric editing and geometry generation. Progress has been rapid. However, due to their mixed scales and cluttered shapes, 3D Gaussian splatting models can produce a blurred or needle-like effect near the surface. At the same time, 3D Gaussian splatting models tend to… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 11 pages, 10 figures

  11. arXiv:2402.14049  [pdf, other

    cs.LG cs.AI physics.ao-ph

    Generative Adversarial Models for Extreme Downscaling of Climate Datasets

    Authors: Guiye Li, Guofeng Cao

    Abstract: Addressing the challenges of climate change requires accurate and high-resolution mapping of climate and weather variables. However, many existing climate datasets, such as the gridded outputs of the state-of-the-art numerical climate models (e.g., general circulation models), are only available at very coarse spatial resolutions due to the model complexity and extremely high computational demand.… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  12. arXiv:2401.07059  [pdf

    cs.CY

    Classifying Proposals of Decentralized Autonomous Organizations Using Large Language Models

    Authors: Christian Ziegler, Marcos Miranda, Guangye Cao, Gustav Arentoft, Doo Wan Nam

    Abstract: Our study demonstrates the effective use of Large Language Models (LLMs) for automating the classification of complex datasets. We specifically target proposals of Decentralized Autonomous Organizations (DAOs), as the clas-sification of this data requires the understanding of context and, therefore, depends on human expertise, leading to high costs associated with the task. The study applies an it… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

    Report number: Dawo/2024/01 ACM Class: H.0

  13. arXiv:2401.06994  [pdf, other

    cs.CV

    UniVision: A Unified Framework for Vision-Centric 3D Perception

    Authors: Yu Hong, Qian Liu, Huayuan Cheng, Danjiao Ma, Hang Dai, Yu Wang, Guangzhi Cao, Yong Ding

    Abstract: The past few years have witnessed the rapid development of vision-centric 3D perception in autonomous driving. Although the 3D perception models share many structural and conceptual similarities, there still exist gaps in their feature representations, data formats, and objectives, posing challenges for unified and efficient 3D perception framework design. In this paper, we present UniVision, a si… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  14. arXiv:2401.06827  [pdf, other

    cs.CV cs.AI cs.CL

    APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning

    Authors: Guiming Cao, Kaize Shi, Hong Fu, Huaiwen Zhang, Guandong Xu

    Abstract: Pre-trained Vision-Language (V-L) models set the benchmark for generalization to downstream tasks among the noteworthy contenders. Many characteristics of the V-L model have been explored in existing research including the challenge of the sensitivity to text input and the tuning process across multi-modal prompts. With the advanced utilization of the V-L model like CLIP, recent approaches deploy… ▽ More

    Submitted 23 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: 7 pages,3 figures

  15. arXiv:2312.13537  [pdf, other

    cs.CV

    HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in Image Editing via Hypernetworks

    Authors: Hai Zhang, Chunwei Wu, Guitao Cao, Hailing Wang, Wenming Cao

    Abstract: Editing real images authentically while also achieving cross-domain editing remains a challenge. Recent studies have focused on converting real images into latent codes and accomplishing image editing by manipulating these codes. However, merely manipulating the latent codes would constrain the edited images to the generator's image domain, hindering the attainment of diverse editing goals. In res… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  16. arXiv:2311.14939  [pdf, other

    cs.CV cs.LG

    OpenNet: Incremental Learning for Autonomous Driving Object Detection with Balanced Loss

    Authors: Zezhou Wang, Guitao Cao, Xidong Xi, Jiangtao Wang

    Abstract: Automated driving object detection has always been a challenging task in computer vision due to environmental uncertainties. These uncertainties include significant differences in object sizes and encountering the class unseen. It may result in poor performance when traditional object detection models are directly applied to automated driving detection. Because they usually presume fixed categorie… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  17. arXiv:2311.13128  [pdf, other

    cs.CV

    P2RBox: Point Prompt Oriented Object Detection with SAM

    Authors: Guangming Cao, Xuehui Yu, Wenwen Yu, Xumeng Han, Xue Yang, Guorong Li, Jianbin Jiao, Zhenjun Han

    Abstract: Single-point annotation in oriented object detection of remote sensing scenarios is gaining increasing attention due to its cost-effectiveness. However, due to the granularity ambiguity of points, there is a significant performance gap between previous methods and those with fully supervision. In this study, we introduce P2RBox, which employs point prompt to generate rotated box (RBox) annotation… ▽ More

    Submitted 23 May, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  18. arXiv:2311.08910  [pdf, other

    cs.CV cs.CR

    Progressive Feedback-Enhanced Transformer for Image Forgery Localization

    Authors: Haochen Zhu, Gang Cao, Xianglin Huang

    Abstract: Blind detection of the forged regions in digital images is an effective authentication means to counter the malicious use of local image editing techniques. Existing encoder-decoder forensic networks overlook the fact that detecting complex and subtle tampered regions typically requires more feedback information. In this paper, we propose a Progressive FeedbACk-enhanced Transformer (ProFact) netwo… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  19. arXiv:2310.19620  [pdf, other

    cs.RO cs.AI cs.CV

    Large Trajectory Models are Scalable Motion Predictors and Planners

    Authors: Qiao Sun, Shiduo Zhang, Danjiao Ma, Jingzhe Shi, Derun Li, Simian Luo, Yu Wang, Ningyi Xu, Guangzhi Cao, Hang Zhao

    Abstract: Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. Inspired by the success of large language models… ▽ More

    Submitted 28 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

  20. arXiv:2309.10243  [pdf

    cs.CV cs.CR

    Transferable Adversarial Attack on Image Tampering Localization

    Authors: Yuqi Wang, Gang Cao, Zijie Lou, Haochen Zhu

    Abstract: It is significant to evaluate the security of existing digital image tampering localization algorithms in real-world applications. In this paper, we propose an adversarial attack scheme to reveal the reliability of such tampering localizers, which would be fooled and fail to predict altered regions correctly. Specifically, the adversarial examples based on optimization and gradient are implemented… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  21. arXiv:2309.09482  [pdf, other

    cs.CV cs.CR

    Spatio-temporal Co-attention Fusion Network for Video Splicing Localization

    Authors: Man Lin, Gang Cao, Zijie Lou

    Abstract: Digital video splicing has become easy and ubiquitous. Malicious users copy some regions of a video and paste them to another video for creating realistic forgeries. It is significant to blindly detect such forgery regions in videos. In this paper, a spatio-temporal co-attention fusion network (SCFNet) is proposed for video splicing localization. Specifically, a three-stream network is used as an… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  22. arXiv:2309.09306  [pdf, other

    cs.CV cs.CR

    Effective Image Tampering Localization via Enhanced Transformer and Co-attention Fusion

    Authors: Kun Guo, Haochen Zhu, Gang Cao

    Abstract: Powerful manipulation techniques have made digital image forgeries be easily created and widespread without leaving visual anomalies. The blind localization of tampered regions becomes quite significant for image forensics. In this paper, we propose an effective image tampering localization network (EITLNet) based on a two-branch enhanced transformer encoder with attention-based feature fusion. Sp… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  23. arXiv:2308.08446  [pdf, other

    cs.IR cs.LG

    CSPM: A Contrastive Spatiotemporal Preference Model for CTR Prediction in On-Demand Food Delivery Services

    Authors: Guyu Jiang, Xiaoyun Li, Rongrong Jing, Ruoqi Zhao, Xingliang Ni, Guodong Cao, Ning Hu

    Abstract: Click-through rate (CTR) prediction is a crucial task in the context of an online on-demand food delivery (OFD) platform for precisely estimating the probability of a user clicking on food items. Unlike universal e-commerce platforms such as Taobao and Amazon, user behaviors and interests on the OFD platform are more location and time-sensitive due to limited delivery ranges and regional commodity… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  24. arXiv:2307.11458  [pdf, other

    cs.CV

    Strip-MLP: Efficient Token Interaction for Vision MLP

    Authors: Guiping Cao, Shengda Luo, Wenjian Huang, Xiangyuan Lan, Dongmei Jiang, Yaowei Wang, Jianguo Zhang

    Abstract: Token interaction operation is one of the core modules in MLP-based models to exchange and aggregate information between different spatial locations. However, the power of token interaction on the spatial dimension is highly dependent on the spatial resolution of the feature maps, which limits the model's expressive ability, especially in deep layers where the feature are down-sampled to a small s… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  25. arXiv:2307.07358  [pdf, other

    cs.RO

    Learn from Incomplete Tactile Data: Tactile Representation Learning with Masked Autoencoders

    Authors: Guanqun Cao, Jiaqi Jiang, Danushka Bollegala, Shan Luo

    Abstract: The missing signal caused by the objects being occluded or an unstable sensor is a common challenge during data collection. Such missing signals will adversely affect the results obtained from the data, and this issue is observed more frequently in robotic tactile perception. In tactile perception, due to the limited working space and the dynamic environment, the contact between the tactile sensor… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: This paper is accepted at IROS 2023

  26. arXiv:2306.12705  [pdf, other

    cs.RO

    Multimodal Zero-Shot Learning for Tactile Texture Recognition

    Authors: Guanqun Cao, Jiaqi Jiang, Danushka Bollegala, Min Li, Shan Luo

    Abstract: Tactile sensing plays an irreplaceable role in robotic material recognition. It enables robots to distinguish material properties such as their local geometry and textures, especially for materials like textiles. However, most tactile recognition methods can only classify known materials that have been touched and trained with tactile data, yet cannot classify unknown materials that are not traine… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: Under review at Robotics and Autonomous Systems

  27. arXiv:2305.13349  [pdf, other

    cs.LG stat.ME

    Multiclass classification for multidimensional functional data through deep neural networks

    Authors: Shuoyang Wang, Guanqun Cao

    Abstract: The intrinsically infinite-dimensional features of the functional observations over multidimensional domains render the standard classification methods effectively inapplicable. To address this problem, we introduce a novel multiclass functional deep neural network (mfDNN) classifier as an innovative data mining and classification tool. Specifically, we consider sparse deep neural network architec… ▽ More

    Submitted 23 May, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    MSC Class: 62G05; 62G08

  28. arXiv:2305.06430  [pdf, other

    cs.CR cs.AI

    HoneyIoT: Adaptive High-Interaction Honeypot for IoT Devices Through Reinforcement Learning

    Authors: Chongqi Guan, Heting Liu, Guohong Cao, Sencun Zhu, Thomas La Porta

    Abstract: As IoT devices are becoming widely deployed, there exist many threats to IoT-based systems due to their inherent vulnerabilities. One effective approach to improving IoT security is to deploy IoT honeypot systems, which can collect attack information and reveal the methods and strategies used by attackers. However, building high-interaction IoT honeypots is challenging due to the heterogeneity of… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: 11 pages, Wisec 2023

  29. arXiv:2304.03946  [pdf, other

    cs.DC cs.LG

    FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

    Authors: Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui

    Abstract: With the increasing data volume, there is a trend of using large-scale pre-trained models to store the knowledge into an enormous number of model parameters. The training of these models is composed of lots of dense algebras, requiring a huge amount of hardware resources. Recently, sparsely-gated Mixture-of-Experts (MoEs) are becoming more popular and have demonstrated impressive pretraining scala… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Comments: Accepted by SIGMOD 2023

    Journal ref: Proc. ACM Manag. Data, Vol. 1, No. 1, Article 110. Publication date: May 2023

  30. arXiv:2304.00157  [pdf, other

    cs.RO

    Robotic Perception of Transparent Objects: A Review

    Authors: Jiaqi Jiang, Guanqun Cao, Jiankang Deng, Thanh-Toan Do, Shan Luo

    Abstract: Transparent object perception is a rapidly developing research problem in artificial intelligence. The ability to perceive transparent objects enables robots to achieve higher levels of autonomy, unlocking new applications in various industries such as healthcare, services and manufacturing. Despite numerous datasets and perception methods being proposed in recent years, there is still a lack of i… ▽ More

    Submitted 17 October, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

    Comments: 21 pages, 11 figures, Accepted by IEEE Transactions on Artificial Intelligence

  31. arXiv:2303.06311  [pdf, other

    hep-ex cs.LG physics.ins-det

    Generative Adversarial Networks for Scintillation Signal Simulation in EXO-200

    Authors: S. Li, I. Ostrovskiy, Z. Li, L. Yang, S. Al Kharusi, G. Anton, I. Badhrees, P. S. Barbeau, D. Beck, V. Belov, T. Bhatta, M. Breidenbach, T. Brunner, G. F. Cao, W. R. Cen, C. Chambers, B. Cleveland, M. Coon, A. Craycraft, T. Daniels, L. Darroch, S. J. Daugherty, J. Davis, S. Delaquis, A. Der Mesrobian-Kabakian , et al. (65 additional authors not shown)

    Abstract: Generative Adversarial Networks trained on samples of simulated or actual events have been proposed as a way of generating large simulated datasets at a reduced computational cost. In this work, a novel approach to perform the simulation of photodetector signals from the time projection chamber of the EXO-200 experiment is demonstrated. The method is based on a Wasserstein Generative Adversarial N… ▽ More

    Submitted 8 May, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

    Comments: As accepted by JINST

    Journal ref: JINST 18 P06005 2023

  32. arXiv:2301.08413  [pdf, other

    cs.CV

    Chaos to Order: A Label Propagation Perspective on Source-Free Domain Adaptation

    Authors: Chunwei Wu, Guitao Cao, Yan Li, Xidong Xi, Wenming Cao, Hong Wang

    Abstract: Source-free domain adaptation (SFDA), where only a pre-trained source model is used to adapt to the target distribution, is a more general approach to achieving domain adaptation in the real world. However, it can be challenging to capture the inherent structure of the target features accurately due to the lack of supervised information on the target domain. By analyzing the clustering performance… ▽ More

    Submitted 14 August, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: Accepted by ACM MM2023

  33. arXiv:2301.06826  [pdf, other

    cs.RO

    Vis2Hap: Vision-based Haptic Rendering by Cross-modal Generation

    Authors: Guanqun Cao, Jiaqi Jiang, Ningtao Mao, Danushka Bollegala, Min Li, Shan Luo

    Abstract: To assist robots in teleoperation tasks, haptic rendering which allows human operators access a virtual touch feeling has been developed in recent years. Most previous haptic rendering methods strongly rely on data collected by tactile sensors. However, tactile data is not widely available for robots due to their limited reachable space and the restrictions of tactile sensors. To eliminate the nee… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: This paper is accepted at ICRA 2023

  34. arXiv:2212.08272  [pdf, other

    cs.DC

    Communication-Efficient Federated Learning for Heterogeneous Edge Devices Based on Adaptive Gradient Quantization

    Authors: Heting Liu, Fang He, Guohong Cao

    Abstract: Federated learning (FL) enables geographically dispersed edge devices (i.e., clients) to learn a global model without sharing the local datasets, where each client performs gradient descent with its local data and uploads the gradients to a central server to update the global model. However, FL faces massive communication overhead resulted from uploading the gradients in each training round. To ad… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

  35. arXiv:2212.02992  [pdf, other

    cs.CV

    Sparse Message Passing Network with Feature Integration for Online Multiple Object Tracking

    Authors: Bisheng Wang, Horst Possegger, Horst Bischof, Guo Cao

    Abstract: Existing Multiple Object Tracking (MOT) methods design complex architectures for better tracking performance. However, without a proper organization of input information, they still fail to perform tracking robustly and suffer from frequent identity switches. In this paper, we propose two novel methods together with a simple online Message Passing Network (MPN) to address these limitations. First,… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: 8 pages, 2 figures

  36. arXiv:2212.02220  [pdf

    cs.CV

    Semi-Supervised Representative Region Texture Extraction of Façade

    Authors: Zhen Ni, Guitao Cao, Ye Duan

    Abstract: Researches of analysis and parsing around façades to enrich the 3D feature of façade models by semantic information raised some attention in the community, whose main idea is to generate higher resolution components with similar shapes and textures to increase the overall resolution at the expense of reconstruction accuracy. While this approach works well for components like windows and doors, the… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: 13 pages, 10 figures

  37. arXiv:2211.03509  [pdf

    cs.CV cs.CR

    Black-Box Attack against GAN-Generated Image Detector with Contrastive Perturbation

    Authors: Zijie Lou, Gang Cao, Man Lin

    Abstract: Visually realistic GAN-generated facial images raise obvious concerns on potential misuse. Many effective forensic algorithms have been developed to detect such synthetic images in recent years. It is significant to assess the vulnerability of such forensic detectors against adversarial attacks. In this paper, we propose a new black-box attack method against GAN-generated image detectors. A novel… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  38. arXiv:2209.09427  [pdf, other

    cs.IR

    Spatiotemporal-Enhanced Network for Click-Through Rate Prediction in Location-based Services

    Authors: Shaochuan Lin, Yicong Yu, Xiyu Ji, Taotao Zhou, Hengxu He, Zisen Sang, Jia Jia, Guodong Cao, Ning Hu

    Abstract: In Location-Based Services(LBS), user behavior naturally has a strong dependence on the spatiotemporal information, i.e., in different geographical locations and at different times, user click behavior will change significantly. Appropriate spatiotemporal enhancement modeling of user click behavior and large-scale sparse attributes is key to building an LBS model. Although most of existing methods… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: accepted by CIKM workshop 2022

  39. arXiv:2208.13739  [pdf

    cs.CV

    Effective Image Tampering Localization with Multi-Scale ConvNeXt Feature Fusion

    Authors: Haochen Zhu, Gang Cao, Mo Zhao

    Abstract: With the widespread use of powerful image editing tools, image tampering becomes easy and realistic. Existing image forensic methods still face challenges of low generalization performance and robustness. In this letter, we propose an effective image tampering localization scheme based on ConvNeXt network and multi-scale feature fusion. Stacked ConvNeXt blocks are used as an encoder to capture hie… ▽ More

    Submitted 16 January, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

  40. arXiv:2208.09743  [pdf, other

    cs.RO

    Where Shall I Touch? Vision-Guided Tactile Poking for Transparent Object Grasping

    Authors: Jiaqi Jiang, Guanqun Cao, Aaron Butterworth, Thanh-Toan Do, Shan Luo

    Abstract: Picking up transparent objects is still a challenging task for robots. The visual properties of transparent objects such as reflection and refraction make the current grasping methods that rely on camera sensing fail to detect and localise them. However, humans can handle the transparent object well by first observing its coarse profile and then poking an area of interest to get a fine profile for… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

    Comments: 11 pages, 11 figures, accepted by T-Mech

  41. arXiv:2207.06754  [pdf, other

    cs.CV

    E2-AEN: End-to-End Incremental Learning with Adaptively Expandable Network

    Authors: Guimei Cao, Zhanzhan Cheng, Yunlu Xu, Duo Li, Shiliang Pu, Yi Niu, Fei Wu

    Abstract: Expandable networks have demonstrated their advantages in dealing with catastrophic forgetting problem in incremental learning. Considering that different tasks may need different structures, recent methods design dynamic structures adapted to different tasks via sophisticated skills. Their routine is to search expandable structures first and then train on the new tasks, which, however, breaks tas… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

  42. arXiv:2207.04907  [pdf, other

    cs.RO

    A4T: Hierarchical Affordance Detection for Transparent Objects Depth Reconstruction and Manipulation

    Authors: Jiaqi Jiang, Guanqun Cao, Thanh-Toan Do, Shan Luo

    Abstract: Transparent objects are widely used in our daily lives and therefore robots need to be able to handle them. However, transparent objects suffer from light reflection and refraction, which makes it challenging to obtain the accurate depth maps required to perform handling tasks. In this paper, we propose a novel affordance-based framework for depth reconstruction and manipulation of transparent obj… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: 8 pages, 9 figures, Accepted by RAL-CASE2022

  43. arXiv:2206.02158  [pdf, other

    cs.CV cs.CR cs.LG

    Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

    Authors: Guodong Cao, Zhibo Wang, Xiaowei Dong, Zhifei Zhang, Hengchang Guo, Zhan Qin, Kui Ren

    Abstract: Adversarial training has been widely explored for mitigating attacks against deep models. However, most existing works are still trapped in the dilemma between higher accuracy and stronger robustness since they tend to fit a model towards robust features (not easily tampered with by adversaries) while ignoring those non-robust but highly predictive features. To achieve a better robustness-accuracy… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

    Comments: 12 pages

  44. arXiv:2205.08592  [pdf, other

    stat.ML cs.LG stat.ME

    Deep Neural Network Classifier for Multi-dimensional Functional Data

    Authors: Shuoyang Wang, Guanqun Cao, Zuofeng Shang

    Abstract: We propose a new approach, called as functional deep neural network (FDNN), for classifying multi-dimensional functional data. Specifically, a deep neural network is trained based on the principle components of the training data which shall be used to predict the class label of a future data function. Unlike the popular functional discriminant analysis approaches which rely on Gaussian assumption,… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

  45. arXiv:2201.11853  [pdf, other

    cs.LG cs.AI

    Prediction of GPU Failures Under Deep Learning Workloads

    Authors: Heting Liu, Zhichao Li, Cheng Tan, Rongqiu Yang, Guohong Cao, Zherui Liu, Chuanxiong Guo

    Abstract: Graphics processing units (GPUs) are the de facto standard for processing deep learning (DL) tasks. Meanwhile, GPU failures, which are inevitable, cause severe consequences in DL tasks: they disrupt distributed trainings, crash inference services, and result in service level agreement violations. To mitigate the problem caused by GPU failures, we propose to predict failures by using ML models. Thi… ▽ More

    Submitted 27 January, 2022; originally announced January 2022.

  46. arXiv:2201.04924  [pdf, other

    cs.CV

    Technical Report for ICCV 2021 Challenge SSLAD-Track3B: Transformers Are Better Continual Learners

    Authors: Duo Li, Guimei Cao, Yunlu Xu, Zhanzhan Cheng, Yi Niu

    Abstract: In the SSLAD-Track 3B challenge on continual learning, we propose the method of COntinual Learning with Transformer (COLT). We find that transformers suffer less from catastrophic forgetting compared to convolutional neural network. The major principle of our method is to equip the transformer based feature extractor with old knowledge distillation and head expanding strategies to compete catastro… ▽ More

    Submitted 13 January, 2022; originally announced January 2022.

    Comments: Rank 1st on ICCV2021 SSLAD-Track 3B

  47. arXiv:2112.14298  [pdf, ps, other

    cs.CV cs.RO

    Multimodal perception for dexterous manipulation

    Authors: Guanqun Cao, Shan Luo

    Abstract: Humans usually perceive the world in a multimodal way that vision, touch, sound are utilised to understand surroundings from various dimensions. These senses are combined together to achieve a synergistic effect where the learning is more effectively than using each sense separately. For robotics, vision and touch are two key senses for the dexterous manipulation. Vision usually gives us apparent… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

    Comments: 19 pages, 10 figures

  48. arXiv:2112.09216  [pdf, other

    eess.IV cs.CV

    A Deep-Learning Framework for Improving COVID-19 CT Image Quality and Diagnostic Accuracy

    Authors: Garvit Goel, Jingyuan Qi, Wu-chun Feng, Guohua Cao

    Abstract: We present a deep-learning based computing framework for fast-and-accurate CT (DL-FACT) testing of COVID-19. Our CT-based DL framework was developed to improve the testing speed and accuracy of COVID-19 (plus its variants) via a DL-based approach for CT image enhancement and classification. The image enhancement network is adapted from DDnet, short for DenseNet and Deconvolution based network. To… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: 10 pages

  49. arXiv:2112.02439  [pdf, other

    cs.NI

    Deep Learning on Mobile Devices Through Neural Processing Units and Edge Computing

    Authors: Tianxiang Tan, Guohong Cao

    Abstract: Deep Neural Network (DNN) is becoming adopted for video analytics on mobile devices. To reduce the delay of running DNNs, many mobile devices are equipped with Neural Processing Units (NPU). However, due to the resource limitations of NPU, these DNNs have to be compressed to increase the processing speed at the cost of accuracy. To address the low accuracy problem, we propose a Confidence Based Of… ▽ More

    Submitted 4 December, 2021; originally announced December 2021.

  50. arXiv:2109.00590  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    WebQA: Multihop and Multimodal QA

    Authors: Yingshan Chang, Mridu Narang, Hisami Suzuki, Guihong Cao, Jianfeng Gao, Yonatan Bisk

    Abstract: Scaling Visual Question Answering (VQA) to the open-domain and multi-hop nature of web searches, requires fundamental advances in visual representation learning, knowledge aggregation, and language generation. In this work, we introduce WebQA, a challenging new benchmark that proves difficult for large-scale state-of-the-art models which lack language groundable visual representations for novel ob… ▽ More

    Submitted 27 March, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: CVPR Camera ready