Towards End-to-End Image Compression and Analysis with Transformers

Bai, Yuanchao; Yang, Xu; Liu, Xianming; Jiang, Junjun; Wang, Yaowei; Ji, Xiangyang; Gao, Wen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2112.09300 (cs)

[Submitted on 17 Dec 2021]

Title:Towards End-to-End Image Compression and Analysis with Transformers

Authors:Yuanchao Bai, Xu Yang, Xianming Liu, Junjun Jiang, Yaowei Wang, Xiangyang Ji, Wen Gao

View PDF

Abstract:We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer. Specifically, we first replace the patchify stem (i.e., image splitting and embedding) of the ViT model with a lightweight image encoder modelled by a convolutional neural network. The compressed features generated by the image encoder are injected convolutional inductive bias and are fed to the Transformer for image classification bypassing image reconstruction. Meanwhile, we propose a feature aggregation module to fuse the compressed features with the selected intermediate features of the Transformer, and feed the aggregated features to a deconvolutional neural network for image reconstruction. The aggregated features can obtain the long-term information from the self-attention mechanism of the Transformer and improve the compression performance. The rate-distortion-accuracy optimization problem is finally solved by a two-step training strategy. Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.

Comments:	Accepted by AAAI 2022; Code: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Cite as:	arXiv:2112.09300 [cs.CV]
	(or arXiv:2112.09300v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2112.09300

Submission history

From: Yuanchao Bai [view email]
[v1] Fri, 17 Dec 2021 03:28:14 UTC (1,956 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Towards End-to-End Image Compression and Analysis with Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Towards End-to-End Image Compression and Analysis with Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators