Sequential Attention GAN for Interactive Image Editing

Cheng, Yu; Gan, Zhe; Li, Yitong; Liu, Jingjing; Gao, Jianfeng

Computer Science > Computer Vision and Pattern Recognition

arXiv:1812.08352 (cs)

[Submitted on 20 Dec 2018 (v1), last revised 5 Aug 2020 (this version, v4)]

Title:Sequential Attention GAN for Interactive Image Editing

Authors:Yu Cheng, Zhe Gan, Yitong Li, Jingjing Liu, Jianfeng Gao

View PDF

Abstract:Most existing text-to-image synthesis tasks are static single-turn generation, based on pre-defined textual descriptions of images. To explore more practical and interactive real-life applications, we introduce a new task - Interactive Image Editing, where users can guide an agent to edit images via multi-turn textual commands on-the-fly. In each session, the agent takes a natural language description from the user as the input and modifies the image generated in the previous turn to a new design, following the user description. The main challenges in this sequential and interactive image generation task are two-fold: 1) contextual consistency between a generated image and the provided textual description; 2) step-by-step region-level modification to maintain visual consistency across the generated image sequence in each session. To address these challenges, we propose a novel Sequential Attention Generative Adversarial Net-work (SeqAttnGAN), which applies a neural state tracker to encode the previous image and the textual description in each turn of the sequence, and uses a GAN framework to generate a modified version of the image that is consistent with the preceding images and coherent with the description. To achieve better region-specific refinement, we also introduce a sequential attention mechanism into the model. To benchmark on the new task, we introduce two new datasets, Zap-Seq and DeepFashion-Seq, which contain multi-turn sessions with image-description sequences in the fashion domain. Experiments on both datasets show that the proposed SeqAttnGANmodel outperforms state-of-the-art approaches on the interactive image editing task across all evaluation metrics including visual quality, image sequence coherence, and text-image consistency.

Comments:	ACM MM 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1812.08352 [cs.CV]
	(or arXiv:1812.08352v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1812.08352

Submission history

From: Yu Cheng [view email]
[v1] Thu, 20 Dec 2018 03:55:33 UTC (5,059 KB)
[v2] Wed, 3 Apr 2019 00:32:27 UTC (1,551 KB)
[v3] Sun, 8 Sep 2019 19:06:18 UTC (986 KB)
[v4] Wed, 5 Aug 2020 22:13:20 UTC (1,764 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Sequential Attention GAN for Interactive Image Editing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Sequential Attention GAN for Interactive Image Editing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators