Author: Liu, Jingtuo : Search

Article

ReSyncer: Rewiring Style-Based Generator for Unified Audio-Visually Synced Facial Performer

Computer Vision – ECCV 2024Pages 348–367https://doi.org/10.1007/978-3-031-72940-9_20

Abstract

Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers. While recent studies explore high-fidelity lip-sync with different techniques, their task-orientated models ...

research-article

Efficient Video Portrait Reenactment via Grid-based Codebook

SIGGRAPH '23: ACM SIGGRAPH 2023 Conference ProceedingsArticle No.: 66, Pages 1–9https://doi.org/10.1145/3588432.3591509

While progress has been made in the field of portrait reenactment, the problem of how to efficiently produce high-fidelity and accurate videos remains. Recent studies build direct mappings between driving signals and their predictions, leading to ...

research-article

Robust video portrait reenactment via personalized representation quantization

AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial IntelligenceArticle No.: 285, Pages 2564–2572https://doi.org/10.1609/aaai.v37i2.25354

While progress has been made in the field of portrait reenactment, the problem of how to produce high-fidelity and robust videos remains. Recent studies normally find it challenging to handle rarely seen target poses due to the limitation of source data. ...

research-article

Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers

SA '22: SIGGRAPH Asia 2022 Conference PapersArticle No.: 17, Pages 1–9https://doi.org/10.1145/3550469.3555393

Previous studies have explored generating accurately lip-synced talking faces for arbitrary targets given audio conditions. However, most of them deform or generate the whole facial area, leading to non-realistic results. In this work, we delve into the ...

Article

UFO: Unified Feature Optimization

Computer Vision – ECCV 2022Pages 472–488https://doi.org/10.1007/978-3-031-19809-0_27

Abstract

This paper proposes a novel Unified Feature Optimization (UFO) paradigm for training and deploying deep models under real-world and large-scale scenarios, which requires a collection of multiple AI functions. UFO aims to benefit each single task ...

Article

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

Computer Vision – ECCV 2022Pages 661–677https://doi.org/10.1007/978-3-031-19781-9_38

Abstract

Numerous attempts have been made to the task of person-agnostic face swapping given its wide applications. While existing methods mostly rely on tedious network and loss designs, they still struggle in the information balancing between the source ...

research-article

StrucTexT: Structured Text Understanding with Multi-Modal Transformers

MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 1912–1920https://doi.org/10.1145/3474085.3475345

Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence. Due to the complexity of content and layout in VRDs, structured text understanding has been a challenging task. Most existing studies decoupled ...

research-article

AutoDet: Pyramid Network Architecture Search for Object Detection

International Journal of Computer Vision (IJCV), Volume 129, Issue 4Pages 1087–1105https://doi.org/10.1007/s11263-020-01415-x

Abstract

Feature pyramids have delivered significant improvement in object detection. However, building effective feature pyramids heavily relies on expert knowledge, and also requires strenuous efforts to balance effectiveness and efficiency. Automatic ...

research-article

Learning Global Structure Consistency for Robust Object Tracking

MM '20: Proceedings of the 28th ACM International Conference on MultimediaPages 229–237https://doi.org/10.1145/3394171.3413644

Fast appearance variations and the distractions of similar objects are two of the most challenging problems in visual object tracking. Unlike many existing trackers that focus on modeling only the target, in this work, we consider the transient ...

Article

Real Image Super Resolution via Heterogeneous Model Ensemble Using GP-NAS

Computer Vision – ECCV 2020 WorkshopsPages 423–436https://doi.org/10.1007/978-3-030-67070-2_25

Abstract

With advancement in deep neural network (DNN), recent state-of-the-art (SOTA) image super-resolution (SR) methods have achieved impressive performance using deep residual network with dense skip connections. While these models perform well on ...

Article

AIM 2020 Challenge on Real Image Super-Resolution: Methods and Results

Computer Vision – ECCV 2020 WorkshopsPages 392–422https://doi.org/10.1007/978-3-030-67070-2_24

Abstract

This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020. This challenge involves three tracks to super-resolve an input image for $\times$ ...

research-article

Progressively Refined Face Detection Through Semantics-Enriched Representation Learning

IEEE Transactions on Information Forensics and Security (TIFS), Volume 15Pages 1394–1406https://doi.org/10.1109/TIFS.2019.2941800

Feature pyramids aim to learn multi-scale representations for detecting faces over various scales. However, they often lack adequate context over different scales, especially when there are many tiny faces in the wild. In this paper, we propose an ...

research-article

Open Access

A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning

MM '19: Proceedings of the 27th ACM International Conference on MultimediaPages 1277–1285https://doi.org/10.1145/3343031.3350988

Detecting scene text of arbitrary shapes has been a challenging task over the past years. In this paper, we propose a novel segmentation-based text detector, namely SAST, which employs a context attended multi-task learning framework based on a Fully ...

research-article

Editing Text in the Wild

MM '19: Proceedings of the 27th ACM International Conference on MultimediaPages 1500–1508https://doi.org/10.1145/3343031.3350929

In this paper, we are interested in editing text in natural images, which aims to replace or modify a word in the source image with another one while maintaining its realistic look. This task is challenging, as the styles of both background and text ...

Article

PyramidBox: A Context-Assisted Single Shot Face Detector

Computer Vision – ECCV 2018Pages 812–828https://doi.org/10.1007/978-3-030-01240-3_49

Abstract

Face detection has been well studied for many years and one of remaining challenges is to detect small, blurred and partially occluded faces in uncontrolled environment. This paper proposes a novel context-assisted single shot face detector, named ...

Search Results

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Caption

ReSyncer: Rewiring Style-Based Generator for Unified Audio-Visually Synced Facial Performer

Efficient Video Portrait Reenactment via Grid-based Codebook

Robust video portrait reenactment via personalized representation quantization

Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers

UFO: Unified Feature Optimization

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

StrucTexT: Structured Text Understanding with Multi-Modal Transformers

AutoDet: Pyramid Network Architecture Search for Object Detection

Learning Global Structure Consistency for Robust Object Tracking

Real Image Super Resolution via Heterogeneous Model Ensemble Using GP-NAS

AIM 2020 Challenge on Real Image Super-Resolution: Methods and Results

Progressively Refined Face Detection Through Semantics-Enriched Representation Learning

A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning

Editing Text in the Wild

PyramidBox: A Context-Assisted Single Shot Face Detector

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder