Reasoning Guided by a Manual: Context-Aware Image Captioning with Novel Objects

Hua, Peiyao; Sun, Haifeng; Hao, Jiachang; Liu, Cong; Wang, Jingyu; Qi, Qi; Liao, Jianxin

doi:10.3233/FAIA230381

Abstract

Novel object captioning task aims at describing objects that are absent from training data. Due to the scarcity of novel objects, it’s challenging to find a way to utilize external data to improve model’s reasoning ability. While previously designed methods all follow a deep learning approach, we boost novel object captioning by incorporating reasoning with traditional deep learning framework. We design a manual from dictionaries that provides our model with sufficient and accurate external information on novel objects. We propose Manual-guided Context-aware Novel Object Captioning model (MC-NOC) that utilizes image and caption context to generate novel object captions. It contains a Manual-Guided Novel Object Reasoning module to reason about novel objects based on other objects of the given image and a Caption Reconstruction module to incorporate novel objects into generated captions according to caption context. We validate MC-NOC with state-of-the-art performance on the challenging Held-out COCO and Nocaps dataset, leading their leaderboard. In particular, we improved the CIDER metric by 6.4 points on the held-out coco dataset. Comprehensive experiments demonstrate our model’s reasoning capability and the quality of generated captions.

Contact

IOS Press Copyright 2024

Contact

IOS Press Copyright 2024

This website uses cookies

This website uses cookies