Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–10 of 10 results for author: Baccouche, M

.
  1. arXiv:2202.06858  [pdf, other

    cs.CV

    An experimental study of the vision-bottleneck in VQA

    Authors: Pierre Marza, Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf

    Abstract: As in many tasks combining vision and language, both modalities play a crucial role in Visual Question Answering (VQA). To properly solve the task, a given model should both understand the content of the proposed image and the nature of the question. While the fusion between modalities, which is another obviously important part of the problem, has been highly studied, the vision part has received… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  2. arXiv:2106.05597  [pdf, other

    cs.CV cs.LG

    Supervising the Transfer of Reasoning Patterns in VQA

    Authors: Corentin Kervadec, Christian Wolf, Grigory Antipov, Moez Baccouche, Madiha Nadri

    Abstract: Methods for Visual Question Anwering (VQA) are notorious for leveraging dataset biases rather than performing reasoning, hindering generalization. It has been recently shown that better reasoning patterns emerge in attention layers of a state-of-the-art VQA model when they are trained on perfect (oracle) visual inputs. This provides evidence that deep neural networks can learn to reason when train… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  3. arXiv:2104.03656  [pdf, other

    cs.CV

    How Transferable are Reasoning Patterns in VQA?

    Authors: Corentin Kervadec, Theo Jaunet, Grigory Antipov, Moez Baccouche, Romain Vuillemot, Christian Wolf

    Abstract: Since its inception, Visual Question Answering (VQA) is notoriously known as a task, where models are prone to exploit biases in datasets to find shortcuts instead of performing high-level reasoning. Classical methods address this by removing biases from training data, or adding branches to models to detect and remove biases. In this paper, we argue that uncertainty in vision is a dominating facto… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

  4. arXiv:2104.00926  [pdf, other

    cs.CV cs.HC

    VisQA: X-raying Vision and Language Reasoning in Transformers

    Authors: Theo Jaunet, Corentin Kervadec, Romain Vuillemot, Grigory Antipov, Moez Baccouche, Christian Wolf

    Abstract: Visual Question Answering systems target answering open-ended textual questions given input images. They are a testbed for learning high-level reasoning with a primary use in HCI, for instance assistance for the visually impaired. Recent research has shown that state-of-the-art models tend to produce answers exploiting biases and shortcuts in the training data, and sometimes do not even look at th… ▽ More

    Submitted 20 July, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  5. arXiv:2006.05726  [pdf, other

    cs.CV cs.CL

    Estimating semantic structure for the VQA answer space

    Authors: Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf

    Abstract: Since its appearance, Visual Question Answering (VQA, i.e. answering a question posed over an image), has always been treated as a classification problem over a set of predefined answers. Despite its convenience, this classification approach poorly reflects the semantics of the problem limiting the answering to a choice between independent proposals, without taking into account the similarity betw… ▽ More

    Submitted 8 April, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: [WARNING] We want to notice the reader that additional experiments (not in the paper) have shown that using a `random' semantic space performs as much as the proposed semantic loss. This additional result question the effectiveness of our method

  6. arXiv:2006.05121  [pdf, other

    cs.CV

    Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?

    Authors: Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf

    Abstract: Models for Visual Question Answering (VQA) are notorious for their tendency to rely on dataset biases, as the large and unbalanced diversity of questions and concepts involved and tends to prevent models from learning to reason, leading them to perform educated guesses instead. In this paper, we claim that the standard evaluation metric, which consists in measuring the overall in-domain accuracy,… ▽ More

    Submitted 7 April, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  7. arXiv:1912.03063  [pdf, other

    cs.CV cs.CL cs.LG cs.NE

    Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks

    Authors: Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf

    Abstract: The large adoption of the self-attention (i.e. transformer model) and BERT-like training principles has recently resulted in a number of high performing models on a large panoply of vision-and-language problems (such as Visual Question Answering (VQA), image retrieval, etc.). In this paper we claim that these State-Of-The-Art (SOTA) approaches perform reasonably well in structuring information ins… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

  8. arXiv:1903.06496  [pdf, other

    cs.LG cs.CV cs.NE

    MFAS: Multimodal Fusion Architecture Search

    Authors: Juan-Manuel Pérez-Rúa, Valentin Vielzeuf, Stéphane Pateux, Moez Baccouche, Frédéric Jurie

    Abstract: We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonst… ▽ More

    Submitted 15 March, 2019; originally announced March 2019.

    Comments: CVPR 2019, Jun 2019, Long Beach, United States http://cvpr2019.thecvf.com/

  9. arXiv:1808.00391  [pdf, other

    cs.CV

    Efficient Progressive Neural Architecture Search

    Authors: Juan-Manuel Perez-Rua, Moez Baccouche, Stephane Pateux

    Abstract: This paper addresses the difficult problem of finding an optimal neural architecture design for a given image classification task. We propose a method that aggregates two main results of the previous state-of-the-art in neural architecture search. These are, appealing to the strong sampling efficiency of a search scheme based on sequential model-based optimization (SMBO), and increasing training e… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

    Comments: Accepted for publication by the BMVA (BMVC 2018)

  10. arXiv:1702.01983  [pdf, other

    cs.CV

    Face Aging With Conditional Generative Adversarial Networks

    Authors: Grigory Antipov, Moez Baccouche, Jean-Luc Dugelay

    Abstract: It has been recently shown that Generative Adversarial Networks (GANs) can produce synthetic images of exceptional visual fidelity. In this work, we propose the GAN-based method for automatic face aging. Contrary to previous works employing GANs for altering of facial attributes, we make a particular emphasize on preserving the original person's identity in the aged version of his/her face. To thi… ▽ More

    Submitted 30 May, 2017; v1 submitted 7 February, 2017; originally announced February 2017.

    Comments: 5 pages, 3 figures, accepted at ICIP 2017. With respect to v1: (1) changed the abbreviation of the main model from "acGAN" to "Age-cGAN" in order to avoid confusion with "Auxiliary Classifier Generative Adversarial Networks" introduced by Odena et al.; (2) corrected a typo in Formula 1