Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 70 results for author: Yi, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16710  [pdf, other

    cs.CV

    Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image

    Authors: Jinkun Hao, Junshu Tang, Jiangning Zhang, Ran Yi, Yijia Hong, Moran Li, Weijian Cao, Yating Wang, Lizhuang Ma

    Abstract: While recent works have achieved great success on one-shot 3D common object generation, high quality and fidelity 3D head generation from a single image remains a great challenge. Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry. To handle this challenging problem, we propose a novel framew… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: https://jinkun-hao.github.io/Portrait3D/

  2. arXiv:2406.14806  [pdf, other

    cs.CV cs.GR

    Relighting Scenes with Object Insertions in Neural Radiance Fields

    Authors: Xuening Zhu, Renjiao Yi, Xin Wen, Chenyang Zhu, Kai Xu

    Abstract: The insertion of objects into a scene and relighting are commonly utilized applications in augmented reality (AR). Previous methods focused on inserting virtual objects using CAD models or real objects from single-view images, resulting in highly limited AR application scenarios. We propose a novel NeRF-based pipeline for inserting object NeRFs into scene NeRFs, enabling novel view synthesis and r… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 14 pages

  3. arXiv:2406.09794  [pdf, other

    cs.CV

    SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis

    Authors: Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L. Rosin, Yu-Kun Lai

    Abstract: SVG (Scalable Vector Graphics) is a widely used graphics format that possesses excellent scalability and editability. Image vectorization, which aims to convert raster images to SVGs, is an important yet challenging problem in computer vision and graphics. Existing image vectorization methods either suffer from low reconstruction accuracy for complex images or require long computation time. To add… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  4. arXiv:2406.02263  [pdf, other

    cs.CV

    M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising

    Authors: Chengjie Wang, Haokun Zhu, Jinlong Peng, Yue Wang, Ran Yi, Yunsheng Wu, Lizhuang Ma, Jiangning Zhang

    Abstract: Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. Yet, both RGB and 3D data are crucial for anomaly detection, and the datasets are seldom completely clean in practical scenarios. To address above challenges, this paper initially delves into the RGB-3D multi-modal noisy anomaly detection, proposing a novel noise-resistant M3DM-NR… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2405.15763  [pdf, other

    cs.CV

    FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis

    Authors: Ke Fan, Junshu Tang, Weijian Cao, Ran Yi, Moran Li, Jingyu Gong, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Lizhuang Ma

    Abstract: Text-to-motion synthesis is a crucial task in computer vision. Existing methods are limited in their universality, as they are tailored for single-person or two-person scenarios and can not be applied to generate motions for more individuals. To achieve the number-free motion synthesis, this paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  6. arXiv:2405.05175  [pdf, other

    cs.CR cs.CL cs.LG

    Air Gap: Protecting Privacy-Conscious Conversational Agents

    Authors: Eugene Bagdasaryan, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, Daniel Ramage

    Abstract: The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into re… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  7. arXiv:2405.00507  [pdf, other

    cs.CV

    NeRF-Guided Unsupervised Learning of RGB-D Registration

    Authors: Zhinan Yu, Zheng Qin, Yijie Tang, Yongjun Wang, Renjiao Yi, Chenyang Zhu, Kai Xu

    Abstract: This paper focuses on training a robust RGB-D registration model without ground-truth pose supervision. Existing methods usually adopt a pairwise training strategy based on differentiable rendering, which enforces the photometric and the geometric consistency between the two registered frames as supervision. However, this frame-to-frame framework suffers from poor multi-view consistency due to fac… ▽ More

    Submitted 20 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  8. arXiv:2404.19040  [pdf, other

    cs.CV

    GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting

    Authors: Bo Chen, Shoukang Hu, Qi Chen, Chenpeng Du, Ran Yi, Yanmin Qian, Xie Chen

    Abstract: We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3$\sim$5 minute video for training material, in comparison with previous 2D and 3D NeRF-based modeling frameworks which require hours of training and seconds of rendering per frame. Specifically, GSTalker learns an audio-driven Ga… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  9. arXiv:2404.15789  [pdf, other

    cs.CV

    MotionMaster: Training-free Camera Motion Transfer For Video Generation

    Authors: Teng Hu, Jiangning Zhang, Ran Yi, Yating Wang, Hongrui Huang, Jieyu Weng, Yabiao Wang, Lizhuang Ma

    Abstract: The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate sub… ▽ More

    Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  10. arXiv:2404.05606  [pdf, other

    cs.CV

    Learning Topology Uniformed Face Mesh by Volume Rendering for Multi-view Reconstruction

    Authors: Yating Wang, Ran Yi, Ke Fan, Jinkun Hao, Jiangbo Lu, Lizhuang Ma

    Abstract: Face meshes in consistent topology serve as the foundation for many face-related applications, such as 3DMM constrained face reconstruction and expression retargeting. Traditional methods commonly acquire topology uniformed face meshes by two separate steps: multi-view stereo (MVS) to reconstruct shapes followed by non-rigid registration to align topology, but struggles with handling noise and non… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  11. arXiv:2404.03518  [pdf, other

    cs.CV

    SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation

    Authors: Sichen Chen, Yingyi Zhang, Siming Huang, Ran Yi, Ke Fan, Ruixin Zhang, Peixian Chen, Jun Wang, Shouhong Ding, Lizhuang Ma

    Abstract: Recently, transformer-based methods have achieved state-of-the-art prediction quality on human pose estimation(HPE). Nonetheless, most of these top-performing transformer-based models are too computation-consuming and storage-demanding to deploy on edge computing platforms. Those transformer-based models that require fewer resources are prone to under-fitting due to their smaller scale and thus pe… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  12. arXiv:2401.09146  [pdf, other

    cs.CV

    Continuous Piecewise-Affine Based Motion Model for Image Animation

    Authors: Hexiang Wang, Fengqi Liu, Qianyu Zhou, Ran Yi, Xin Tan, Lizhuang Ma

    Abstract: Image animation aims to bring static images to life according to driving videos and create engaging visual content that can be used for various purposes such as animation, entertainment, and education. Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image. However, limited by the expressive p… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  13. arXiv:2401.08092  [pdf, other

    cs.LG cs.AI cs.DC

    A Survey of Resource-efficient LLM and Multimodal Foundation Models

    Authors: Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu

    Abstract: Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. To support the growth of the… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  14. arXiv:2401.02032  [pdf, other

    cs.CV

    DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

    Authors: Yunfan Ye, Kai Xu, Yuhang Huang, Renjiao Yi, Zhiping Cai

    Abstract: Limited by the encoder-decoder architecture, learning-based edge detectors usually have difficulty predicting edge maps that satisfy both correctness and crispness. With the recent success of the diffusion probabilistic model (DPM), we found it is especially suitable for accurate and crisp edge detection since the denoising process is directly applied to the original image size. Therefore, we prop… ▽ More

    Submitted 9 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: AAAI 2024

  15. arXiv:2312.15139  [pdf, other

    cs.CV

    Automatic Tooth Arrangement with Joint Features of Point and Mesh Representations via Diffusion Probabilistic Models

    Authors: Changsong Lei, Mengfei Xia, Shaofeng Wang, Yaqian Liang, Ran Yi, Yuhui Wen, Yongjin Liu

    Abstract: Tooth arrangement is a crucial step in orthodontics treatment, in which aligning teeth could improve overall well-being, enhance facial aesthetics, and boost self-confidence. To improve the efficiency of tooth arrangement and minimize errors associated with unreasonable designs by inexperienced practitioners, some deep learning-based tooth arrangement methods have been proposed. Currently, most ex… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  16. arXiv:2312.10111  [pdf, other

    cs.CV

    Plasticine3D: Non-rigid 3D editting with text guidance

    Authors: Yige Chen, Ang Chen, Siyuan Chen, Ran Yi

    Abstract: With the help of Score Distillation Sampling(SDS) and the rapid development of various trainable 3D representations, Text-to-Image(T2I) diffusion models have been applied to 3D generation tasks and achieved considerable results. There are also some attempts toward the task of editing 3D objects leveraging this Text-to-3D pipeline. However, most methods currently focus on adding additional geometri… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 8 pages,7 figures

  17. arXiv:2312.05767  [pdf, other

    cs.CV

    AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model

    Authors: Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, Chengjie Wang

    Abstract: Anomaly inspection plays an important role in industrial manufacture. Existing anomaly inspection methods are limited in their performance due to insufficient anomaly data. Although anomaly generation methods have been proposed to augment the anomaly data, they either suffer from poor generation authenticity or inaccurate alignment between the generated anomalies and masks. To address the above pr… ▽ More

    Submitted 21 February, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: AAAI 2024

  18. arXiv:2311.18208  [pdf, other

    cs.LG cs.CV

    SMaRt: Improving GANs with Score Matching Regularity

    Authors: Mengfei Xia, Yujun Shen, Ceyuan Yang, Ran Yi, Wenping Wang, Yong-jin Liu

    Abstract: Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex. In this work, we revisit the mathematical foundations of GANs, and theoretically reveal that the native adversarial loss for GAN training is insufficient to fix the problem of subsets with positive Lebesgue measure of the generated data manifold lying out of the real… ▽ More

    Submitted 7 February, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  19. arXiv:2311.05276  [pdf, other

    cs.CV

    SAMVG: A Multi-stage Image Vectorization Model with the Segment-Anything Model

    Authors: Haokun Zhu, Juang Ian Chong, Teng Hu, Ran Yi, Yu-Kun Lai, Paul L. Rosin

    Abstract: Vector graphics are widely used in graphical designs and have received more and more attention. However, unlike raster images which can be easily obtained, acquiring high-quality vector graphics, typically through automatically converting from raster images remains a significant challenge, especially for more complex images such as photos or artworks. In this paper, we propose SAMVG, a multi-stage… ▽ More

    Submitted 25 December, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: Accepted by ICASSP 2024

  20. arXiv:2310.09469  [pdf, other

    cs.CV

    Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner

    Authors: Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Ran Yi, Deli Zhao, Wenping Wang, Yong-jin Liu

    Abstract: A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. Existing acceleration algorithms simplify the sampling by skipping most steps yet exhibit considerable performance degradation. By viewing the generation of diffusion models as a discretized integrating process, we argue that the quality drop is partly caused… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  21. arXiv:2310.02977  [pdf, other

    cs.CV cs.CL cs.LG

    T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation

    Authors: Yuze He, Yushi Bai, Matthieu Lin, Wang Zhao, Yubin Hu, Jenny Sheng, Ran Yi, Juanzi Li, Yong-Jin Liu

    Abstract: Recent methods in text-to-3D leverage powerful pretrained diffusion models to optimize NeRF. Notably, these methods are able to produce high-quality 3D scenes without training on 3D data. Due to the open-ended nature of the task, most studies evaluate their results with subjective case studies and user experiments, thereby presenting a challenge in quantitatively addressing the question: How has c… ▽ More

    Submitted 17 April, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Under review

  22. arXiv:2310.00249  [pdf, other

    cs.CV

    MMPI: a Flexible Radiance Field Representation by Multiple Multi-plane Images Blending

    Authors: Yuze He, Peng Wang, Yubin Hu, Wang Zhao, Ran Yi, Yong-Jin Liu, Wenping Wang

    Abstract: This paper presents a flexible representation of neural radiance fields based on multi-plane images (MPI), for high-quality view synthesis of complex scenes. MPI with Normalized Device Coordinate (NDC) parameterization is widely used in NeRF learning for its simple definition, easy calculation, and powerful ability to represent unbounded scenes. However, existing NeRF works that adopt MPI represen… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

  23. arXiv:2309.11132  [pdf, other

    cs.CV cs.AI cs.LG

    Contrastive Pseudo Learning for Open-World DeepFake Attribution

    Authors: Zhimin Sun, Shen Chen, Taiping Yao, Bangjie Yin, Ran Yi, Shouhong Ding, Lizhuang Ma

    Abstract: The challenge in sourcing attribution for forgery faces has gained widespread attention due to the rapid development of generative techniques. While many recent works have taken essential steps on GAN-generated faces, more threatening attacks related to identity swapping or expression transferring are still overlooked. And the forgery traces hidden in unknown attacks from the open-world unlabeled… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 16 pages, 7 figures, ICCV 2023

  24. arXiv:2309.03729  [pdf, other

    cs.CV

    Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption

    Authors: Teng Hu, Jiangning Zhang, Liang Liu, Ran Yi, Siqi Kou, Haokun Zhu, Xu Chen, Yabiao Wang, Chengjie Wang, Lizhuang Ma

    Abstract: Training a generative model with limited number of samples is a challenging task. Current methods primarily rely on few-shot model adaption to train the network. However, in scenarios where data is extremely limited (less than 10), the generative network tends to overfit and suffers from content degradation. To address these problems, we propose a novel phasic content fusing few-shot diffusion mod… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  25. Toward High Quality Facial Representation Learning

    Authors: Yue Wang, Jinlong Peng, Jiangning Zhang, Ran Yi, Liang Liu, Yabiao Wang, Chengjie Wang

    Abstract: Face analysis tasks have a wide range of applications, but the universal facial representation has only been explored in a few works. In this paper, we explore high-performance pre-training methods to boost the face analysis tasks such as face alignment and face parsing. We propose a self-supervised pre-training framework, called \textbf{\it Mask Contrastive Face (MCF)}, with mask image modeling a… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: ACM MM 2023

  26. Stroke-based Neural Painting and Stylization with Dynamically Predicted Painting Region

    Authors: Teng Hu, Ran Yi, Haokun Zhu, Liang Liu, Jinlong Peng, Yabiao Wang, Chengjie Wang, Lizhuang Ma

    Abstract: Stroke-based rendering aims to recreate an image with a set of strokes. Most existing methods render complex images using an uniform-block-dividing strategy, which leads to boundary inconsistency artifacts. To solve the problem, we propose Compositional Neural Painter, a novel stroke-based rendering framework which dynamically predicts the next painting region based on the current canvas, instead… ▽ More

    Submitted 10 October, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: ACM MM 2023

  27. arXiv:2308.14352  [pdf, other

    cs.LG cs.AI cs.CL

    EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models

    Authors: Rongjie Yi, Liwei Guo, Shiyun Wei, Ao Zhou, Shangguang Wang, Mengwei Xu

    Abstract: Large Language Models (LLMs) such as GPTs and LLaMa have ushered in a revolution in machine intelligence, owing to their exceptional capabilities in a wide range of machine learning tasks. However, the transition of LLMs from data centers to edge devices presents a set of challenges and opportunities. While this shift can enhance privacy and availability, it is hampered by the enormous parameter s… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  28. arXiv:2308.05667  [pdf, other

    cs.CV

    2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds

    Authors: Minhao Li, Zheng Qin, Zhirui Gao, Renjiao Yi, Chenyang Zhu, Yulan Guo, Kai Xu

    Abstract: The commonly adopted detect-then-match approach to registration finds difficulties in the cross-modality cases due to the incompatible keypoint detection and inconsistent feature description. We propose, 2D3D-MATR, a detection-free method for accurate and robust registration between images and point clouds. Our method adopts a coarse-to-fine pipeline where it first computes coarse correspondences… ▽ More

    Submitted 14 August, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023

  29. arXiv:2308.01686  [pdf, other

    cs.CV cs.AI

    LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment

    Authors: Zhiwei Zhang, Zhizhong Zhang, Qian Yu, Ran Yi, Yuan Xie, Lizhuang Ma

    Abstract: 3D panoptic segmentation is a challenging perception task that requires both semantic segmentation and instance segmentation. In this task, we notice that images could provide rich texture, color, and discriminative information, which can complement LiDAR data for evident performance improvement, but their fusion remains a challenging problem. To this end, we propose LCPS, the first LiDAR-Camera P… ▽ More

    Submitted 11 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: Accepted as ICCV 2023 paper

  30. arXiv:2307.06099  [pdf, other

    cs.CV

    RFENet: Towards Reciprocal Feature Evolution for Glass Segmentation

    Authors: Ke Fan, Changan Wang, Yabiao Wang, Chengjie Wang, Ran Yi, Lizhuang Ma

    Abstract: Glass-like objects are widespread in daily life but remain intractable to be segmented for most existing methods. The transparent property makes it difficult to be distinguished from background, while the tiny separation boundary further impedes the acquisition of their exact contour. In this paper, by revealing the key co-evolution demand of semantic and boundary learning, we propose a Selective… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: Accepted by 2023 International Joint Conference on Artificial Intelligence (IJCAI2023)

  31. arXiv:2306.15989  [pdf, other

    cs.GR cs.AI

    Tensorformer: Normalized Matrix Attention Transformer for High-quality Point Cloud Reconstruction

    Authors: Hui Tian, Zheng Qin, Renjiao Yi, Chenyang Zhu, Kai Xu

    Abstract: Surface reconstruction from raw point clouds has been studied for decades in the computer graphics community, which is highly demanded by modeling and rendering applications nowadays. Classic solutions, such as Poisson surface reconstruction, require point normals as extra input to perform reasonable results. Modern transformer-based methods can work without normals, while the results are less fin… ▽ More

    Submitted 10 October, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

  32. Delving into Crispness: Guided Label Refinement for Crisp Edge Detection

    Authors: Yunfan Ye, Renjiao Yi, Zhirui Gao, Zhiping Cai, Kai Xu

    Abstract: Learning-based edge detection usually suffers from predicting thick edges. Through extensive quantitative study with a new edge crispness measure, we find that noisy human-labeled edges are the main cause of thick predictions. Based on this observation, we advocate that more attention should be paid on label quality than on model design to achieve crisp edge detection. To this end, we propose an e… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Accepted by TIP

  33. arXiv:2305.11191  [pdf, other

    cs.CR cs.CV

    Towards Generalizable Data Protection With Transferable Unlearnable Examples

    Authors: Bin Fang, Bo Li, Shuang Wu, Tianyi Zheng, Shouhong Ding, Ran Yi, Lizhuang Ma

    Abstract: Artificial Intelligence (AI) is making a profound impact in almost every domain. One of the crucial factors contributing to this success has been the access to an abundance of high-quality data for constructing machine learning models. Lately, as the role of data in artificial intelligence has been significantly magnified, concerns have arisen regarding the secure utilization of data, particularly… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:2305.10691

  34. arXiv:2305.10691  [pdf, other

    cs.CR cs.CV

    Re-thinking Data Availablity Attacks Against Deep Neural Networks

    Authors: Bin Fang, Bo Li, Shuang Wu, Ran Yi, Shouhong Ding, Lizhuang Ma

    Abstract: The unauthorized use of personal data for commercial purposes and the clandestine acquisition of private data for training machine learning models continue to raise concerns. In response to these issues, researchers have proposed availability attacks that aim to render data unexploitable. However, many current attack methods are rendered ineffective by adversarial training. In this paper, we re-ex… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  35. Instance-Aware Domain Generalization for Face Anti-Spoofing

    Authors: Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Ran Yi, Shouhong Ding, Lizhuang Ma

    Abstract: Face anti-spoofing (FAS) based on domain generalization (DG) has been recently studied to improve the generalization on unseen scenarios. Previous methods typically rely on domain labels to align the distribution of each domain for learning domain-invariant representations. However, artificial domain labels are coarse-grained and subjective, which cannot reflect real domain distributions accuratel… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  36. arXiv:2303.15166  [pdf, other

    cs.CV

    Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method

    Authors: Ran Yi, Haoyuan Tian, Zhihao Gu, Yu-Kun Lai, Paul L. Rosin

    Abstract: Image aesthetics assessment (IAA) is a challenging task due to its highly subjective nature. Most of the current studies rely on large-scale datasets (e.g., AVA and AADB) to learn a general model for all kinds of photography images. However, little light has been shed on measuring the aesthetic quality of artistic images, and the existing datasets only contain relatively few artworks. Such a defec… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  37. arXiv:2303.14184  [pdf, other

    cs.CV

    Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior

    Authors: Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, Dong Chen

    Abstract: In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. This is inherently challenging: it essentially involves estimating the underlying 3D geometry while simultaneously hallucinating unseen textures. To address this challenge, we leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation. Our appr… ▽ More

    Submitted 3 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: 17 pages, 18 figures, Project page: https://make-it-3d.github.io/

  38. arXiv:2303.13852  [pdf, other

    cs.CV

    Weakly-supervised Single-view Image Relighting

    Authors: Renjiao Yi, Chenyang Zhu, Kai Xu

    Abstract: We present a learning-based approach to relight a single image of Lambertian and low-frequency specular objects. Our method enables inserting objects from photographs into new scenes and relighting them under the new environment lighting, which is essential for AR applications. To relight the object, we solve both inverse rendering and re-rendering. To resolve the ill-posed inverse rendering, we p… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: 21 pages, with supplementary material

  39. Learning Accurate Template Matching with Differentiable Coarse-to-Fine Correspondence Refinement

    Authors: Zhirui Gao, Renjiao Yi, Zheng Qin, Yunfan Ye, Chenyang Zhu, Kai Xu

    Abstract: Template matching is a fundamental task in computer vision and has been studied for decades. It plays an essential role in manufacturing industry for estimating the poses of different parts, facilitating downstream tasks such as robotic grasping. Existing methods fail when the template and source images have different modalities, cluttered backgrounds or weak textures. They also rarely consider ge… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Journal ref: Computational Visual Media 2023

  40. arXiv:2303.07653  [pdf, other

    cs.CV

    NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images

    Authors: Yunfan Ye, Renjiao Yi, Zhirui Gao, Chenyang Zhu, Zhiping Cai, Kai Xu

    Abstract: We study the problem of reconstructing 3D feature curves of an object from a set of calibrated multi-view images. To do so, we learn a neural implicit field representing the density distribution of 3D edges which we refer to as Neural Edge Field (NEF). Inspired by NeRF, NEF is optimized with a view-based rendering loss where a 2D edge map is rendered at a given view and is compared to the ground-t… ▽ More

    Submitted 16 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  41. arXiv:2303.00601  [pdf, other

    cs.CV

    Multimodal Industrial Anomaly Detection via Hybrid Fusion

    Authors: Yue Wang, Jinlong Peng, Jiangning Zhang, Ran Yi, Yabiao Wang, Chengjie Wang

    Abstract: 2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we pr… ▽ More

    Submitted 7 September, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  42. arXiv:2212.01538  [pdf, other

    cs.CV

    Multi-resolution Monocular Depth Map Fusion by Self-supervised Gradient-based Composition

    Authors: Yaqiao Dai, Renjiao Yi, Chenyang Zhu, Hongjun He, Kai Xu

    Abstract: Monocular depth estimation is a challenging problem on which deep neural networks have demonstrated great potential. However, depth maps predicted by existing deep models usually lack fine-grained details due to the convolution operations and the down-samplings in networks. We find that increasing input resolution is helpful to preserve more local details while the estimation at low resolution is… ▽ More

    Submitted 3 December, 2022; originally announced December 2022.

    Comments: 19 pages (with supplementary material)

  43. arXiv:2209.08266  [pdf, other

    cs.CV

    6DOF Pose Estimation of a 3D Rigid Object based on Edge-enhanced Point Pair Features

    Authors: Chenyi Liu, Fei Chen, Lu Deng, Renjiao Yi, Lintao Zheng, Chenyang Zhu, Jia Wang, Kai Xu

    Abstract: The point pair feature (PPF) is widely used for 6D pose estimation. In this paper, we propose an efficient 6D pose estimation method based on the PPF framework. We introduce a well-targeted down-sampling strategy that focuses more on edge area for efficient feature extraction of complex geometry. A pose hypothesis validation approach is proposed to resolve the symmetric ambiguity by calculating ed… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

    Comments: 16 pages,20 figures

  44. Generative Domain Adaptation for Face Anti-Spoofing

    Authors: Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Ran Yi, Kekai Sheng, Shouhong Ding, Lizhuang Ma

    Abstract: Face anti-spoofing (FAS) approaches based on unsupervised domain adaption (UDA) have drawn growing attention due to promising performances for target scenarios. Most existing UDA FAS methods typically fit the trained models to the target domain via aligning the distribution of semantic high-level features. However, insufficient supervision of unlabeled target domains and neglect of low-level featu… ▽ More

    Submitted 11 September, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2022

  45. Adaptive Mixture of Experts Learning for Generalizable Face Anti-Spoofing

    Authors: Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Ran Yi, Shouhong Ding, Lizhuang Ma

    Abstract: With various face presentation attacks emerging continually, face anti-spoofing (FAS) approaches based on domain generalization (DG) have drawn growing attention. Existing DG-based FAS approaches always capture the domain-invariant features for generalizing on the various unseen domains. However, they neglect individual source domains' discriminative characteristics and diverse domain-specific inf… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to ACM MM 2022

  46. arXiv:2206.07446  [pdf

    cs.LG

    Boosting DNN Cold Inference on Edge Devices

    Authors: Rongjie Yi, Ting Cao, Ao Zhou, Xiao Ma, Shangguang Wang, Mengwei Xu

    Abstract: DNNs are ubiquitous on edge devices nowadays. With its increasing importance and use cases, it's not likely to pack all DNNs into device memory and expect that each inference has been warmed up. Therefore, cold inference, the process to read, initialize, and execute a DNN model, is becoming commonplace and its performance is urgently demanded to be optimized. To this end, we present NNV12, the fir… ▽ More

    Submitted 26 August, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

  47. Real-time Controllable Motion Transition for Characters

    Authors: Xiangjun Tang, He Wang, Bo Hu, Xu Gong, Ruifan Yi, Qilong Kou, Xiaogang Jin

    Abstract: Real-time in-between motion generation is universally required in games and highly desirable in existing animation pipelines. Its core challenge lies in the need to satisfy three critical conditions simultaneously: quality, controllability and speed, which renders any methods that need offline computation (or post-processing) or cannot incorporate (often unpredictable) user control undesirable. To… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Journal ref: ACM Transactions on Graphics (Proc. Siggraph 2022), 2022, 41(4)

  48. arXiv:2204.06180  [pdf, other

    cs.CV cs.GR cs.MM

    Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions

    Authors: Zipeng Ye, Zhiyao Sun, Yu-Hui Wen, Yanan Sun, Tian Lv, Ran Yi, Yong-Jin Liu

    Abstract: Recently, talking-face video generation has received considerable attention. So far most methods generate results with neutral expressions or expressions that are implicitly determined by neural networks in an uncontrollable way. In this paper, we propose a method to generate talking-face videos with continuously controllable expressions in real-time. Our method is based on an important observatio… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

  49. arXiv:2203.16771  [pdf, other

    cs.CV

    LAKe-Net: Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints

    Authors: Junshu Tang, Zhijun Gong, Ran Yi, Yuan Xie, Lizhuang Ma

    Abstract: Point cloud completion aims at completing geometric and topological shapes from a partial observation. However, some topology of the original shape is missing, existing methods directly predict the location of complete points, without predicting structured and topological information of the complete shape, which leads to inferior performance. To better tackle the missing topology part, we propose… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: 10 pages, 8 figures

  50. arXiv:2203.08612  [pdf, other

    cs.CV cs.GR

    CtlGAN: Few-shot Artistic Portraits Generation with Contrastive Transfer Learning

    Authors: Yue Wang, Ran Yi, Luying Li, Ying Tai, Chengjie Wang, Lizhuang Ma

    Abstract: Generating artistic portraits is a challenging problem in computer vision. Existing portrait stylization models that generate good quality results are based on Image-to-Image Translation and require abundant data from both source and target domains. However, without enough data, these methods would result in overfitting. In this work, we propose CtlGAN, a new few-shot artistic portraits generation… ▽ More

    Submitted 8 March, 2024; v1 submitted 16 March, 2022; originally announced March 2022.