Paraphrasing Is All You Need for Novel Object Captioning

Yang, Cheng-Fu; Tsai, Yao-Hung Hubert; Fan, Wan-Cyuan; Salakhutdinov, Ruslan; Morency, Louis-Philippe; Wang, Yu-Chiang Frank

Computer Science > Computer Vision and Pattern Recognition

arXiv:2209.12343v1 (cs)

[Submitted on 25 Sep 2022]

Title:Paraphrasing Is All You Need for Novel Object Captioning

Authors:Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Ruslan Salakhutdinov, Louis-Philippe Morency, Yu-Chiang Frank Wang

View PDF

Abstract:Novel object captioning (NOC) aims to describe images containing objects without observing their ground truth captions during training. Due to the absence of caption annotation, captioning models cannot be directly optimized via sequence-to-sequence training or CIDEr optimization. As a result, we present Paraphrasing-to-Captioning (P2C), a two-stage learning framework for NOC, which would heuristically optimize the output captions via paraphrasing. With P2C, the captioning model first learns paraphrasing from a language model pre-trained on text-only corpus, allowing expansion of the word bank for improving linguistic fluency. To further enforce the output caption sufficiently describing the visual content of the input image, we perform self-paraphrasing for the captioning model with fidelity and adequacy objectives introduced. Since no ground truth captions are available for novel object images during training, our P2C leverages cross-modality (image-text) association modules to ensure the above caption characteristics can be properly preserved. In the experiments, we not only show that our P2C achieves state-of-the-art performances on nocaps and COCO Caption datasets, we also verify the effectiveness and flexibility of our learning framework by replacing language and cross-modality association models for NOC. Implementation details and code are available in the supplementary materials.

Comments:	Accepted at NeurIPS 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2209.12343 [cs.CV]
	(or arXiv:2209.12343v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2209.12343

Submission history

From: Cheng-Fu Yang [view email]
[v1] Sun, 25 Sep 2022 22:56:04 UTC (17,530 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Paraphrasing Is All You Need for Novel Object Captioning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Paraphrasing Is All You Need for Novel Object Captioning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators