research-article

From Superficial to Deep: Language Bias driven Curriculum Learning for Visual Question Answering

Authors:

Mingrui Lao,

Yanming Guo,

Yu Liu,

Wei Chen,

Nan Pu,

Michael S. LewAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 3370 - 3379

https://doi.org/10.1145/3474085.3475492

Published: 17 October 2021 Publication History

Get Access

Abstract

Most Visual Question Answering (VQA) models are faced with language bias when learning to answer a given question, thereby failing to understand multimodal knowledge simultaneously. Based on the fact that VQA samples with different levels of language bias contribute differently for answer prediction, in this paper, we overcome the language prior problem by proposing a novel Language Bias driven Curriculum Learning (LBCL) approach, which employs an easy-to-hard learning strategy with a novel difficulty metric Visual Sensitive Coefficient (VSC). Specifically, in the initial training stage, the VQA model mainly learns the superficial textual correlations between questions and answers (easy concept) from more-biased examples, and then progressively focuses on learning the multimodal reasoning (hard concept) from less-biased examples in the following stages. The curriculum selection of examples on different stages is according to our proposed difficulty metric VSC, which is to evaluate the difficulty driven by the language bias of each VQA sample. Furthermore, to avoid the catastrophic forgetting of the learned concept during the multi-stage learning procedure, we propose to integrate knowledge distillation into the curriculum learning framework. Extensive experiments show that our LBCL can be generally applied to common VQA baseline models, and achieves remarkably better performance on the VQA-CP v1 and v2 datasets, with an overall 20% accuracy boost over baseline models.

Supplementary Material

ZIP File (mfp1779aux.zip)

A pdf contains a supplemental experiment on small-scale datasets, and a case study for our proposed Language Bias driven Curriculum Learning for Visual Question Answering.

Download
2.83 MB

References

[1]

Aishwarya Agrawal, Dhruv Batra, Devi Parikh, and Aniruddha Kembhavi. 2018. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Multi-stage reasoning on introspecting and revising bias for visual question answering

A Competence-Aware Curriculum for Visual Concepts Learning via Question Answering

Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations