Effective and Efficient ROI-wise Visual Encoding Using an End-to-End CNN Regression Model and Selective Optimization

Qiao, Kai; Zhang, Chi; Chen, Jian; Wang, Linyuan; Tong, Li; Yan, Bin

doi:10.1007/978-981-16-1288-6_5

Kai Qiao⁶,
Chi Zhang⁶,
Jian Chen⁶,
Linyuan Wang⁶,
Li Tong⁶ &
…
Bin Yan⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1369))

Included in the following conference series:

International Workshop on Human Brain and Artificial Intelligence

625 Accesses
3 Citations

Abstract

In neuroscience, visual encoding based on functional magnetic resonance imaging (fMRI) has been attracting much attention, especially with the recent development of deep learning. Visual encoding model is aimed at predicting subjects’ brain activity in response to presented image stimuli . Current visual encoding models firstly extract image features through a pre-trained convolutional neural network (CNN) model, and secondly learn to linearly map the extracted CNN features to each voxel. However, it is hard for the two-step manner of visual encoding model to guarantee the extracted features are linearly well-matched with fMRI voxels, which reduces final encoding performance. Analogizing the development of the computer vision domain, we introduced the end-to-end manner into the visual encoding domain. In this study, we designed an end-to-end convolution regression model (ETECRM) and selective optimization based on the region of interest (ROI)-wise manner to accomplish more effective and efficient visual encoding. The model can automatically learn to extract better-matched features for encoding performance based on the end-to-end manner. The model can directly encode an entire visual ROI containing enormous voxels for encoding efficiency based on the ROI-wise manner, where the selective optimization was used to avoid the interference of some ineffective voxels in the same ROI. Experimental results demonstrated that ETECRM obtained improved encoding performance and efficiency than previous two-step models. Comparative analysis implied that the end-to-end manner and large volume of fMRI data are potential for the visual encoding domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reconstruction of 3D Images from Human Activity by a Compound Reconstruction Model

Article 28 April 2022

fMRI Visual Image Reconstruction Using Sparse Logistic Regression with a Tunable Regularization Parameter

An fMRI-based visual decoding framework combined with two-stage learning and asynchronous iterative update strategy

Article 01 October 2024

References

Mitchell, T.M., Shinkareva, S.V., Carlson, A., Chang, K.-M., Malave, V.L., Mason, R.A., et al.: Predicting human brain activity associated with the meanings of nouns. Science 320(5880), 1191–1195 (2008). https://doi.org/10.1126/science.1152876
Article Google Scholar
Naselaris, T., Kay, K.N., Nishimoto, S., Gallant, J.L.: Encoding and decoding in fMRI. Neuroimage 56(2), 400–410 (2011). https://doi.org/10.1016/j.neuroimage.2010.07.073
Article Google Scholar
Liang, Z., Higashi, H., Oba, S., Ishii, S.: Brain dynamics encoding from visual input during free viewing of natural videos. In: International Joint Conference on Neural Networks, pp. 1–8. IEEE Press, Budapest, Hungary (2019)
Google Scholar
Pinti, P., et al.: The present and future use of functional near-infrared spectroscopy (fNIRS) for cognitive neuroscience. Ann. N. Y. Acad. Sci. 1464, 1–5 (2020). https://doi.org/10.1111/nyas.13948
Article Google Scholar
Ramkumar, P., Hansen, B.C., Pannasch, S., Loschky, L.C.: Visual information representation and rapid-scene categorization are simultaneous across cortex: an MEG study. Neuroimage 134, 295–304 (2016). https://doi.org/10.1016/j.neuroimage.2016.03.027
Article Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). https://doi.org/10.1109/TPAMI.2013.50
Article Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:13126229 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE Press, Las Vegas (2016)
Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Computer Vision and Pattern Recognition, pp. 1–8. IEEE Press, Anchorage, Alaska (2008)
Google Scholar
Kay, K.N., Naselaris, T., Prenger, R.J., Gallant, J.L.: Identifying natural images from human brain activity. Nature 452(7185), 352 (2008)
Article Google Scholar
Huth, A.G., Nishimoto, S., Vu, A.T., Gallant, J.L.: A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron 76(6), 1210–1224 (2012)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, p. 1097–105. NIPS Press, Lake Tahoe, Nevada (2012)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015). https://doi.org/10.1038/nature14539
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning. MIT Press, Cambridge (2016)
Google Scholar
Agrawal, P., Stansbury, D., Malik, J., Gallant, J.L.: Pixels to voxels: modeling visual representation in the human brain. arXiv preprint arXiv:14075104 (2014)
Yamins, D.L., Hong, H., Cadieu, C.F., Solomon, E.A., Seibert, D., DiCarlo, J.J.: Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. 111(23), 8619–8624 (2014)
Article Google Scholar
Güçlü, U., van Gerven, M.A.: Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J. Neurosci. 35(27), 10005–10014 (2015)
Article Google Scholar
Eickenberg, M., Gramfort, A., Varoquaux, G., Thirion, B.: Seeing it all: convolutional network layers map the function of the human visual system. Neuroimage 152, 184–194 (2016). https://doi.org/10.1016/j.neuroimage.2016.10.001
Article Google Scholar
Styves, G., Naselaris, T.: The feature-weighted receptive field: an interpretable encoding model for complex feature spaces. Neuroimage 180, 188–202 (2018)
Article Google Scholar
Wen, H., Shi, J., Chen, W., Liu, Z.: Deep residual network predicts cortical representation and organization of visual features for rapid categorization. Sci. Rep. 8(1), 3752 (2018). https://doi.org/10.1038/s41598-018-22160-9
Article Google Scholar
Shi, J., Wen, H., Zhang, Y., Han, K., Liu, Z.: Deep recurrent neural network reveals a hierarchy of process memory during dynamic natural vision. Hum. Brain Mapp. 39(5), 2269–2282 (2018). https://doi.org/10.1002/hbm.24006
Article Google Scholar
Han, K., Wen, H., Shi, J., Lu, K.-H., Zhang, Y., Liu, Z.: Variational autoencoder: an unsupervised model for modeling and decoding fMRI activity in visual cortex. bioRxiv 214247 (2017)
Google Scholar
Qiao, K., Zhang, C., Wang, L., Chen, J., Zeng, L., Tong, L., et al.: Accurate reconstruction of image stimuli from human functional magnetic resonance imaging based on the decoding model with capsule network architecture. Front. Neuroinform. 12, 62 (2018)
Google Scholar
Horikawa, T., Kamitani, Y.: Generic decoding of seen and imagined objects using hierarchical visual features. Nat. Commun. 8(1), 1–15 (2017). https://doi.org/10.1038/ncomms15037
Article Google Scholar
Zhang, C., et al.: A visual encoding model based on deep neural networks and transfer learning for brain activity measured by functional magnetic resonance imaging. J. Neurosci. Methods 325, 108318 (2019)
Article Google Scholar
Chang, N., Pyles, J.A., Marcus, A., Gupta, A., Tarr, M.J., Aminoff, E.M.: BOLD5000, a public fMRI dataset while viewing 5000 visual images. Sci. Data 6(1), 49 (2019)
Article Google Scholar
Needell, D., Vershynin, R.: Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE J. Sel. Top. Sign. Proces. 4(2), 310–316 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Henan Key Laboratory of Imaging and Intelligence Processing, PLA Strategy Support Force Information Engineering University, Zhengzhou, 450001, China
Kai Qiao, Chi Zhang, Jian Chen, Linyuan Wang, Li Tong & Bin Yan

Authors

Kai Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Chi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Linyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Li Tong
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Zhejiang University, Hangzhou, China
Yueming Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiao, K., Zhang, C., Chen, J., Wang, L., Tong, L., Yan, B. (2021). Effective and Efficient ROI-wise Visual Encoding Using an End-to-End CNN Regression Model and Selective Optimization. In: Wang, Y. (eds) Human Brain and Artificial Intelligence. HBAI 2021. Communications in Computer and Information Science, vol 1369. Springer, Singapore. https://doi.org/10.1007/978-981-16-1288-6_5

Download citation

DOI: https://doi.org/10.1007/978-981-16-1288-6_5
Published: 08 April 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1287-9
Online ISBN: 978-981-16-1288-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Effective and Efficient ROI-wise Visual Encoding Using an End-to-End CNN Regression Model and Selective Optimization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reconstruction of 3D Images from Human Activity by a Compound Reconstruction Model

fMRI Visual Image Reconstruction Using Sparse Logistic Regression with a Tunable Regularization Parameter

An fMRI-based visual decoding framework combined with two-stage learning and asynchronous iterative update strategy

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Effective and Efficient ROI-wise Visual Encoding Using an End-to-End CNN Regression Model and Selective Optimization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reconstruction of 3D Images from Human Activity by a Compound Reconstruction Model

fMRI Visual Image Reconstruction Using Sparse Logistic Regression with a Tunable Regularization Parameter

An fMRI-based visual decoding framework combined with two-stage learning and asynchronous iterative update strategy

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation