Abstract
In neuroscience, visual encoding based on functional magnetic resonance imaging (fMRI) has been attracting much attention, especially with the recent development of deep learning. Visual encoding model is aimed at predicting subjects’ brain activity in response to presented image stimuli . Current visual encoding models firstly extract image features through a pre-trained convolutional neural network (CNN) model, and secondly learn to linearly map the extracted CNN features to each voxel. However, it is hard for the two-step manner of visual encoding model to guarantee the extracted features are linearly well-matched with fMRI voxels, which reduces final encoding performance. Analogizing the development of the computer vision domain, we introduced the end-to-end manner into the visual encoding domain. In this study, we designed an end-to-end convolution regression model (ETECRM) and selective optimization based on the region of interest (ROI)-wise manner to accomplish more effective and efficient visual encoding. The model can automatically learn to extract better-matched features for encoding performance based on the end-to-end manner. The model can directly encode an entire visual ROI containing enormous voxels for encoding efficiency based on the ROI-wise manner, where the selective optimization was used to avoid the interference of some ineffective voxels in the same ROI. Experimental results demonstrated that ETECRM obtained improved encoding performance and efficiency than previous two-step models. Comparative analysis implied that the end-to-end manner and large volume of fMRI data are potential for the visual encoding domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mitchell, T.M., Shinkareva, S.V., Carlson, A., Chang, K.-M., Malave, V.L., Mason, R.A., et al.: Predicting human brain activity associated with the meanings of nouns. Science 320(5880), 1191–1195 (2008). https://doi.org/10.1126/science.1152876
Naselaris, T., Kay, K.N., Nishimoto, S., Gallant, J.L.: Encoding and decoding in fMRI. Neuroimage 56(2), 400–410 (2011). https://doi.org/10.1016/j.neuroimage.2010.07.073
Liang, Z., Higashi, H., Oba, S., Ishii, S.: Brain dynamics encoding from visual input during free viewing of natural videos. In: International Joint Conference on Neural Networks, pp. 1–8. IEEE Press, Budapest, Hungary (2019)
Pinti, P., et al.: The present and future use of functional near-infrared spectroscopy (fNIRS) for cognitive neuroscience. Ann. N. Y. Acad. Sci. 1464, 1–5 (2020). https://doi.org/10.1111/nyas.13948
Ramkumar, P., Hansen, B.C., Pannasch, S., Loschky, L.C.: Visual information representation and rapid-scene categorization are simultaneous across cortex: an MEG study. Neuroimage 134, 295–304 (2016). https://doi.org/10.1016/j.neuroimage.2016.03.027
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). https://doi.org/10.1109/TPAMI.2013.50
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:13126229 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE Press, Las Vegas (2016)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Computer Vision and Pattern Recognition, pp. 1–8. IEEE Press, Anchorage, Alaska (2008)
Kay, K.N., Naselaris, T., Prenger, R.J., Gallant, J.L.: Identifying natural images from human brain activity. Nature 452(7185), 352 (2008)
Huth, A.G., Nishimoto, S., Vu, A.T., Gallant, J.L.: A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron 76(6), 1210–1224 (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, p. 1097–105. NIPS Press, Lake Tahoe, Nevada (2012)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015). https://doi.org/10.1038/nature14539
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning. MIT Press, Cambridge (2016)
Agrawal, P., Stansbury, D., Malik, J., Gallant, J.L.: Pixels to voxels: modeling visual representation in the human brain. arXiv preprint arXiv:14075104 (2014)
Yamins, D.L., Hong, H., Cadieu, C.F., Solomon, E.A., Seibert, D., DiCarlo, J.J.: Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. 111(23), 8619–8624 (2014)
Güçlü, U., van Gerven, M.A.: Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J. Neurosci. 35(27), 10005–10014 (2015)
Eickenberg, M., Gramfort, A., Varoquaux, G., Thirion, B.: Seeing it all: convolutional network layers map the function of the human visual system. Neuroimage 152, 184–194 (2016). https://doi.org/10.1016/j.neuroimage.2016.10.001
Styves, G., Naselaris, T.: The feature-weighted receptive field: an interpretable encoding model for complex feature spaces. Neuroimage 180, 188–202 (2018)
Wen, H., Shi, J., Chen, W., Liu, Z.: Deep residual network predicts cortical representation and organization of visual features for rapid categorization. Sci. Rep. 8(1), 3752 (2018). https://doi.org/10.1038/s41598-018-22160-9
Shi, J., Wen, H., Zhang, Y., Han, K., Liu, Z.: Deep recurrent neural network reveals a hierarchy of process memory during dynamic natural vision. Hum. Brain Mapp. 39(5), 2269–2282 (2018). https://doi.org/10.1002/hbm.24006
Han, K., Wen, H., Shi, J., Lu, K.-H., Zhang, Y., Liu, Z.: Variational autoencoder: an unsupervised model for modeling and decoding fMRI activity in visual cortex. bioRxiv 214247 (2017)
Qiao, K., Zhang, C., Wang, L., Chen, J., Zeng, L., Tong, L., et al.: Accurate reconstruction of image stimuli from human functional magnetic resonance imaging based on the decoding model with capsule network architecture. Front. Neuroinform. 12, 62 (2018)
Horikawa, T., Kamitani, Y.: Generic decoding of seen and imagined objects using hierarchical visual features. Nat. Commun. 8(1), 1–15 (2017). https://doi.org/10.1038/ncomms15037
Zhang, C., et al.: A visual encoding model based on deep neural networks and transfer learning for brain activity measured by functional magnetic resonance imaging. J. Neurosci. Methods 325, 108318 (2019)
Chang, N., Pyles, J.A., Marcus, A., Gupta, A., Tarr, M.J., Aminoff, E.M.: BOLD5000, a public fMRI dataset while viewing 5000 visual images. Sci. Data 6(1), 49 (2019)
Needell, D., Vershynin, R.: Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE J. Sel. Top. Sign. Proces. 4(2), 310–316 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Qiao, K., Zhang, C., Chen, J., Wang, L., Tong, L., Yan, B. (2021). Effective and Efficient ROI-wise Visual Encoding Using an End-to-End CNN Regression Model and Selective Optimization. In: Wang, Y. (eds) Human Brain and Artificial Intelligence. HBAI 2021. Communications in Computer and Information Science, vol 1369. Springer, Singapore. https://doi.org/10.1007/978-981-16-1288-6_5
Download citation
DOI: https://doi.org/10.1007/978-981-16-1288-6_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1287-9
Online ISBN: 978-981-16-1288-6
eBook Packages: Computer ScienceComputer Science (R0)