Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization

Yan, Shiyang; Xie, Yuan; Wu, Fangyu; Smith, Jeremy S.; Lu, Wenjin; Zhang, Bailing

Computer Science > Computer Vision and Pattern Recognition

arXiv:1811.05253 (cs)

[Submitted on 13 Nov 2018 (v1), last revised 11 Jan 2019 (this version, v2)]

Title:Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization

Authors:Shiyang Yan, Yuan Xie, Fangyu Wu, Jeremy S. Smith, Wenjin Lu, Bailing Zhang

View PDF

Abstract:Automatically generating the descriptions of an image, i.e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing. Based on the successful deep learning models, especially the CNN model and Long Short-Term Memories (LSTMs) with attention mechanism, we propose a hierarchical attention model by utilizing both of the global CNN features and the local object features for more effective feature representation and reasoning in image captioning. The generative adversarial network (GAN), together with a reinforcement learning (RL) algorithm, is applied to solve the exposure bias problem in RNN-based supervised training for language problems. In addition, through the automatic measurement of the consistency between the generated caption and the image content by the discriminator in the GAN framework and RL optimization, we make the finally generated sentences more accurate and natural. Comprehensive experiments show the improved performance of the hierarchical attention mechanism and the effectiveness of our RL-based optimization method. Our model achieves state-of-the-art results on several important metrics in the MSCOCO dataset, using only greedy inference.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1811.05253 [cs.CV]
	(or arXiv:1811.05253v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1811.05253

Submission history

From: Shiyang Yan [view email]
[v1] Tue, 13 Nov 2018 12:31:26 UTC (4,799 KB)
[v2] Fri, 11 Jan 2019 03:31:31 UTC (4,722 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators