Apr 13, 2020 · We pre-train an Oscar model on the public corpus of 6.5 million text-image pairs, and fine-tune it on downstream tasks, creating new state-of- ...
Abstract. Large-scale pre-training methods of learning cross-modal rep- resentations on image-text pairs are becoming popular for vision-language tasks.
We pre-train an Oscar model on the public corpus of 6.5 million text-image pairs, and fine-tune it on downstream tasks, creating new state-of-the-arts on six ...
We propose a new cross-modal pre-training method Oscar (Object-Semantics Aligned Pre-training). It leverages object tags detected in images as anchor points.
Aug 24, 2020 · We propose a new learning method Oscar (Object-Semantics Aligned Pre-training), which uses object tags detected in images as anchor points to significantly ...
Oscar is a novel method for explicitly learning correlations between images and text with salient object as anchor points.
Oct 8, 2022 · OSCAR generates more detailed descriptions of images than the baseline, due to the use of the accurate and diverse object tags detected by Faster R-CNN.
Apr 11, 2024 · Bibliographic details on Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks.
OSCAR is a new learning method that uses object tags detected in images as anchor points to ease the learning of image-text alignment.
Apr 13, 2020 · This paper proposes a new learning method Oscar (Object-Semantics Aligned Pre-training), which uses object tags detected in images as anchor points.