Decomposition, Compression, and Synthesis Based Video Coding: A Neural Approach Through Reference-Based Super Resolution

Lu, Ming; Chen, Tong; Dai, zhenyu; Wang, Dong; Ding, Dandan; Ma, Zhan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2012.00650v2 (cs)

[Submitted on 1 Dec 2020 (v1), revised 25 Apr 2021 (this version, v2), latest version 15 Jan 2024 (v5)]

Title:Decomposition, Compression, and Synthesis Based Video Coding: A Neural Approach Through Reference-Based Super Resolution

Authors:Ming Lu, Tong Chen, zhenyu Dai, Dong Wang, Dandan Ding, Zhan Ma

View PDF

Abstract:In pursuit of higher compression efficiency, a potential solution is the Down-Sampling based Video Coding (DSVC) where a input video is first downscaled for encoding at a relatively lower resolution, and then decoded frames are super-resolved through deep neural networks (DNNs). However, the coding gains are often bounded due to either uniform resolution sampling induced severe loss of high-frequency component, or insufficient information aggregation across non-uniformly sampled frames in existing DSVC methods. To address this, we propose to first decompose the input video into respective spatial texture frames (STFs) at its native spatial resolution that preserve the rich spatial details, and the other temporal motion frames (TMFs) at a lower spatial resolution that retain the motion smoothness; then compress them together using any popular video coder; and finally synthesize decoded STFs and TMFs for high-fidelity video reconstruction at the same resolution as its native input. This work simply applies the bicubic sampling in decomposition and Versatile Video Coding (VVC) compliant codec in compression, and puts the focus on the synthesis part. Such cross-resolution synthesis can be facilitated by Reference-based Super-Resolution (RefSR). Specifically, a motion compensation network (MCN) is devised on TMFs to efficiently align and aggregate temporal motion features that will be jointly processed with corresponding STFs using a texture transfer network (TTN) to better augment spatial details, by which the compression and resolution re-sampling noises can be effectively alleviated with better rate-distortion (R-D) efficiency, etc.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
Cite as:	arXiv:2012.00650 [cs.CV]
	(or arXiv:2012.00650v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2012.00650

Submission history

From: Ming Lu [view email]
[v1] Tue, 1 Dec 2020 17:23:53 UTC (1,826 KB)
[v2] Sun, 25 Apr 2021 06:16:32 UTC (5,300 KB)
[v3] Wed, 30 Jun 2021 05:20:27 UTC (2,808 KB)
[v4] Sat, 18 Dec 2021 05:12:12 UTC (12,499 KB)
[v5] Mon, 15 Jan 2024 13:17:58 UTC (1,633 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Decomposition, Compression, and Synthesis Based Video Coding: A Neural Approach Through Reference-Based Super Resolution

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Decomposition, Compression, and Synthesis Based Video Coding: A Neural Approach Through Reference-Based Super Resolution

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators