Search | arXiv e-print repository

Retinotopic Mapping Enhances the Robustness of Convolutional Neural Networks

Authors: Jean-Nicolas Jérémie, Emmanuel Daucé, Laurent U Perrinet

Abstract: Foveated vision, a trait shared by many animals, including humans, has not been fully utilized in machine learning applications, despite its significant contributions to biological visual function. This study investigates whether retinotopic mapping, a critical component of foveated vision, can enhance image categorization and localization performance when integrated into deep convolutional neural… ▽ More Foveated vision, a trait shared by many animals, including humans, has not been fully utilized in machine learning applications, despite its significant contributions to biological visual function. This study investigates whether retinotopic mapping, a critical component of foveated vision, can enhance image categorization and localization performance when integrated into deep convolutional neural networks (CNNs). Retinotopic mapping was integrated into the inputs of standard off-the-shelf convolutional neural networks (CNNs), which were then retrained on the ImageNet task. As expected, the logarithmic-polar mapping improved the network's ability to handle arbitrary image zooms and rotations, particularly for isolated objects. Surprisingly, the retinotopically mapped network achieved comparable performance in classification. Furthermore, the network demonstrated improved classification localization when the foveated center of the transform was shifted. This replicates a crucial ability of the human visual system that is absent in typical convolutional neural networks (CNNs). These findings suggest that retinotopic mapping may be fundamental to significant preattentive visual processes. △ Less

Submitted 9 August, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

arXiv:2312.14685 [pdf, other]

Kernel Heterogeneity Improves Sparseness of Natural Images Representations

Authors: Hugo J. Ladret, Christian Casanova, Laurent Udo Perrinet

Abstract: Both biological and artificial neural networks inherently balance their performance with their operational cost, which balances their computational abilities. Typically, an efficient neuromorphic neural network is one that learns representations that reduce the redundancies and dimensionality of its input. This is for instance achieved in sparse coding, and sparse representations derived from natu… ▽ More Both biological and artificial neural networks inherently balance their performance with their operational cost, which balances their computational abilities. Typically, an efficient neuromorphic neural network is one that learns representations that reduce the redundancies and dimensionality of its input. This is for instance achieved in sparse coding, and sparse representations derived from natural images yield representations that are heterogeneous, both in their sampling of input features and in the variance of those features. Here, we investigated the connection between natural images' structure, particularly oriented features, and their corresponding sparse codes. We showed that representations of input features scattered across multiple levels of variance substantially improve the sparseness and resilience of sparse codes, at the cost of reconstruction performance. This echoes the structure of the model's input, allowing to account for the heterogeneously aleatoric structures of natural images. We demonstrate that learning kernel from natural images produces heterogeneity by balancing between approximate and dense representations, which improves all reconstruction metrics. Using a parametrized control of the kernels' heterogeneity used by a convolutional sparse coding algorithm, we show that heterogeneity emphasizes sparseness, while homogeneity improves representation granularity. In a broader context, these encoding strategy can serve as inputs to deep convolutional neural networks. We prove that such variance-encoded sparse image datasets enhance computational efficiency, emphasizing the benefits of kernel heterogeneity to leverage naturalistic and variant input structures and possible applications to improve the throughput of neuromorphic hardware. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2205.03635 [pdf, other]

doi 10.3390/vision7020029

Ultrafast Image Categorization in Biology and Neural Models

Authors: Jean-Nicolas Jérémie, Laurent U Perrinet

Abstract: Humans are able to categorize images very efficiently, in particular to detect the presence of an animal very quickly. Recently, deep learning algorithms based on convolutional neural networks (CNNs) have achieved higher than human accuracy for a wide range of visual categorization tasks. However, the tasks on which these artificial networks are typically trained and evaluated tend to be highly sp… ▽ More Humans are able to categorize images very efficiently, in particular to detect the presence of an animal very quickly. Recently, deep learning algorithms based on convolutional neural networks (CNNs) have achieved higher than human accuracy for a wide range of visual categorization tasks. However, the tasks on which these artificial networks are typically trained and evaluated tend to be highly specialized and do not generalize well, e.g., accuracy drops after image rotation. In this respect, biological visual systems are more flexible and efficient than artificial systems for more general tasks, such as recognizing an animal. To further the comparison between biological and artificial neural networks, we re-trained the standard VGG 16 CNN on two independent tasks that are ecologically relevant to humans: detecting the presence of an animal or an artifact. We show that re-training the network achieves a human-like level of performance, comparable to that reported in psychophysical tasks. In addition, we show that the categorization is better when the outputs of the models are combined. Indeed, animals (e.g., lions) tend to be less present in photographs that contain artifacts (e.g., buildings). Furthermore, these re-trained models were able to reproduce some unexpected behavioral observations from human psychophysics, such as robustness to rotation (e.g., an upside-down or tilted image) or to a grayscale transformation. Finally, we quantified the number of CNN layers required to achieve such performance and showed that good accuracy for ultrafast image categorization can be achieved with only a few layers, challenging the belief that image recognition requires deep sequential analysis of visual objects. △ Less

Submitted 31 May, 2023; v1 submitted 7 May, 2022; originally announced May 2022.

arXiv:1812.01335 [pdf, other]

From biological vision to unsupervised hierarchical sparse coding

Authors: Victor Boutin, Angelo Franciosini, Franck Ruffier, Laurent. U Perrinet

Abstract: The formation of connections between neural cells is emerging essentially from an unsupervised learning process. For instance, during the development of the primary visual cortex of mammals (V1), we observe the emergence of cells selective to localized and oriented features. This leads to the development of a rough contour-based representation of the retinal image in area V1. We propose a biologic… ▽ More The formation of connections between neural cells is emerging essentially from an unsupervised learning process. For instance, during the development of the primary visual cortex of mammals (V1), we observe the emergence of cells selective to localized and oriented features. This leads to the development of a rough contour-based representation of the retinal image in area V1. We propose a biological model of the formation of this representation along the thalamo-cortical pathway. To achieve this goal, we replicated the Multi-Layer Convolutional Sparse Coding (ML-CSC) algorithm developed by Michael Elad's group. △ Less

Submitted 4 December, 2018; originally announced December 2018.

Comments: in Proceedings of iTWIST'18, Paper-ID: 17, Marseille, France, November, 21-23, 2018

arXiv:1611.01390 [pdf, other]

Bayesian Modeling of Motion Perception using Dynamical Stochastic Textures

Authors: Jonathan Vacher, Andrew Isaac Meso, Laurent U. Perrinet, Gabriel Peyré

Abstract: A common practice to account for psychophysical biases in vision is to frame them as consequences of a dynamic process relying on optimal inference with respect to a generative model. The present study details the complete formulation of such a generative model intended to probe visual motion perception with a dynamic texture model. It is first derived in a set of axiomatic steps constrained by bi… ▽ More A common practice to account for psychophysical biases in vision is to frame them as consequences of a dynamic process relying on optimal inference with respect to a generative model. The present study details the complete formulation of such a generative model intended to probe visual motion perception with a dynamic texture model. It is first derived in a set of axiomatic steps constrained by biological plausibility. We extend previous contributions by detailing three equivalent formulations of this texture model. First, the composite dynamic textures are constructed by the random aggregation of warped patterns, which can be viewed as 3D Gaussian fields. Secondly, these textures are cast as solutions to a stochastic partial differential equation (sPDE). This essential step enables real time, on-the-fly texture synthesis using time-discretized auto-regressive processes. It also allows for the derivation of a local motion-energy model, which corresponds to the log-likelihood of the probability density. The log-likelihoods are essential for the construction of a Bayesian inference framework. We use the dynamic texture model to psychophysically probe speed perception in humans using zoom-like changes in the spatial frequency content of the stimulus. The human data replicates previous findings showing perceived speed to be positively biased by spatial frequency increments. A Bayesian observer who combines a Gaussian likelihood centered at the true speed and a spatial frequency dependent width with a "slow speed prior" successfully accounts for the perceptual bias. More precisely, the bias arises from a decrease in the observer's likelihood width estimated from the experiments as the spatial frequency increases. Such a trend is compatible with the trend of the dynamic texture likelihood width. △ Less

Submitted 21 August, 2018; v1 submitted 2 November, 2016; originally announced November 2016.

Comments: article+supplementary, 34+5 pages, 10+1 figures, accepted to Neural Computation. arXiv admin note: text overlap with arXiv:1511.02705

arXiv:1511.02705 [pdf, other]

Biologically Inspired Dynamic Textures for Probing Motion Perception

Authors: Jonathan Vacher, Andrew Meso, Laurent U Perrinet, Gabriel Peyré

Abstract: Perception is often described as a predictive process based on an optimal inference with respect to a generative model. We study here the principled construction of a generative model specifically crafted to probe motion perception. In that context, we first provide an axiomatic, biologically-driven derivation of the model. This model synthesizes random dynamic textures which are defined by statio… ▽ More Perception is often described as a predictive process based on an optimal inference with respect to a generative model. We study here the principled construction of a generative model specifically crafted to probe motion perception. In that context, we first provide an axiomatic, biologically-driven derivation of the model. This model synthesizes random dynamic textures which are defined by stationary Gaussian distributions obtained by the random aggregation of warped patterns. Importantly, we show that this model can equivalently be described as a stochastic partial differential equation. Using this characterization of motion in images, it allows us to recast motion-energy models into a principled Bayesian inference framework. Finally, we apply these textures in order to psychophysically probe speed perception in humans. In this framework, while the likelihood is derived from the generative model, the prior is estimated from the observed results and accounts for the perceptual bias in a principled fashion. △ Less

Submitted 9 November, 2015; originally announced November 2015.

Comments: Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS), Dec 2015, Montreal, Canada

Showing 1–6 of 6 results for author: Perrinet, L U