research-article

Lite-Mind: Towards Efficient and Robust Brain Representation Learning

Authors:

Duoqian MiaoAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 4014 - 4023

https://doi.org/10.1145/3664647.3681229

Published: 28 October 2024 Publication History

Abstract

The limited data availability and the low signal-to-noise ratio of fMRI signals lead to the challenging task of fMRI-to-image retrieval. State-of-the-art MindEye remarkably improves fMRI-to-image retrieval performance by leveraging a large model, i.e., a 996M MLP Backbone per subject, to align fMRI embeddings to the final hidden layer of CLIP's Vision Transformer (ViT). However, significant individual variations exist among subjects, even under identical experimental setups, mandating the training of large subject-specific models. The substantial parameters pose significant challenges in deploying fMRI decoding on practical devices. To this end, we propose Lite-Mind, a lightweight, efficient, and robust brain representation learning paradigm based on Discrete Fourier Transform (DFT), which efficiently aligns fMRI voxels to fine-grained information of CLIP. We elaborately design a DFT backbone with Spectrum Compression and Frequency Projector modules to learn informative and robust voxel embeddings. Our experiments demonstrate that Lite-Mind achieves an impressive 94.6% fMRI-to-image retrieval accuracy on the NSD dataset for Subject 1, with 98.7% fewer parameters than MindEye. Lite-Mind is also proven to be able to be migrated to smaller fMRI datasets and establishes a new state-of-the-art for zero-shot classification on the GOD dataset.

References

[1]

Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, et al. 2022. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience 25, 1 (2022), 116--126.

[2]

Romain Beaumont. 2022. Clip retrieval: Easily compute clip embeddings and build a clip retrieval system with them.

[3]

Defu Cao, Yujing Wang, Juanyong Duan, Ce Zhang, Xia Zhu, Congrui Huang, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, et al. 2020. Spectral temporal graph neural network for multivariate time-series forecasting. Advances in neural information processing systems 33 (2020), 17766--17778.

[4]

Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. 2020. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems 33 (2020), 9912--9924.

[5]

Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, and Juan Helen Zhou. 2023. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22710--22720.

[6]

Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[7]

Changde Du, Kaicheng Fu, Jinpeng Li, and Huiguang He. 2023. Decoding visual neural representations by multimodal learning of brain-visual-linguistic features. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

Digital Library

[8]

Chersoni Emmanuele, Santus Enrico, Huang Chu-Ren, Alessandro Lenci, et al. 2021. Decoding word embeddings with brain-based semantic features. Computational Linguistics 47, 3 (2021), 663--698.

[9]

Zijin Gu, Keith Jamison, Amy Kuceyeski, and Mert Sabuncu. 2022. Decoding natural image stimuli from fMRI data with a surface-based convolutional network. arXiv preprint arXiv:2212.02409 (2022).

[10]

John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, and Bryan Catanzaro. 2021. Adaptive fourier neural operators: Efficient token mixers for transformers. arXiv preprint arXiv:2111.13587 (2021).

[11]

Tomoyasu Horikawa and Yukiyasu Kamitani. 2017. Generic decoding of seen and imagined objects using hierarchical visual features. Nature communications 8, 1 (2017), 15037.

[12]

Yukiyasu Kamitani and Frank Tong. 2005. Decoding the visual and subjective contents of the human brain. Nature neuroscience 8, 5 (2005), 679--685.

[13]

Viktor Kewenig, Christopher Edwards, Quitterie Lacome DEstalenx, Akilles Rechardt, Jeremy I Skipper, and Gabriella Vigliocco. 2023. Evidence of Human-Like Visual-Linguistic Integration in Multimodal Large Language Models During Predictive Language Processing. arXiv preprint arXiv:2308.06035 (2023).

[14]

Reese Kneeland, Jordyn Ojeda, Ghislain St-Yves, and Thomas Naselaris. 2023. Brain-optimized inference improves reconstructions of fMRI brain activity. arXiv preprint arXiv:2312.07705 (2023).

[15]

Emirhan Koç and Aykut Koç. 2022. Fractional Fourier Transform in Time Series Prediction. IEEE Signal Processing Letters 29 (2022), 2542--2546.

[16]

Henning Lange, Steven L Brunton, and J Nathan Kutz. 2021. From Fourier to Koopman: Spectral methods for long-term time series prediction. The Journal of Machine Learning Research 22, 1 (2021), 1881--1918.

Digital Library

[17]

An Lao, Qi Zhang, Chongyang Shi, Longbing Cao, Kun Yi, Liang Hu, and Duoqian Miao. 2024. Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector. In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI, February 20-27, 2024, Vancouver, Canada. AAAI Press, 18426--18434.

[18]

James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, and Santiago Ontanon. 2021. Fnet: Mixing tokens with fourier transforms. arXiv preprint arXiv:2105.03824 (2021).

[19]

Sikun Lin, Thomas Sprague, and Ambuj K Singh. 2022. Mind reader: Reconstructing complex images from brain activities. Advances in Neural Information Processing Systems 35 (2022), 29624--29636.

[20]

David Linden. 2021. Section 3 - Introduction. In fMRI Neurofeedback, Michelle Hampson (Ed.). Academic Press, 161--169. https://doi.org/10.1016/B978-0-12-822421-2.00008-9

[21]

Yulong Liu, Yongqiang Ma, Wei Zhou, Guibo Zhu, and Nanning Zheng. 2023. BrainCLIP: Bridging Brain and Visual-Linguistic Representation via CLIP for Generic Natural Visual Stimulus Decoding from fMRI. arXiv preprint arXiv:2302.12971 (2023).

[22]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).

[23]

Weijian Mai and Zhijun Zhang. 2023. Unibrain: Unify image reconstruction and captioning all in one diffusion model from human brain activity. arXiv preprint arXiv:2308.07428 (2023).

[24]

Thomas Naselaris, Kendrick N Kay, Shinji Nishimoto, and Jack L Gallant. 2011. Encoding and decoding in fMRI. Neuroimage 56, 2 (2011), 400--410.

[25]

Furkan Ozcelik and Rufin VanRullen. 2023. Brain-diffuser: Natural scene reconstruction from fmri signals using generative latent diffusion. arXiv preprint arXiv:2303.05334 (2023).

[26]

Simone Palazzo, Concetto Spampinato, Isaak Kavasidis, Daniela Giordano, Joseph Schmidt, and Mubarak Shah. 2020. Decoding brain representations by multimodal learning of neural activity and visual features. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 11 (2020), 3833--3849.

[27]

Francisco Pereira, Bin Lou, Brianna Pritchett, Samuel Ritter, Samuel J Gershman, Nancy Kanwisher, Matthew Botvinick, and Evelina Fedorenko. 2018. Toward a universal decoder of linguistic meaning from brain activation. Nature communications 9, 1 (2018), 963.

[28]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.

[29]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).

[30]

Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. 2021. Global filter networks for image classification. Advances in neural information processing systems 34 (2021), 980--993.

[31]

Edgar Schonfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, and Zeynep Akata. 2019. Generalized zero-and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8247--8255.

[32]

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems 35 (2022), 25278--25294.

[33]

Paul S Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, et al. 2023. Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors. arXiv preprint arXiv:2305.18274 (2023).

[34]

Guohua Shen, Tomoyasu Horikawa, Kei Majima, and Yukiyasu Kamitani. 2019. Deep image reconstruction from human brain activity. PLoS computational biology 15, 1 (2019), e1006633.

[35]

Yuge Shi, Brooks Paige, Philip Torr, et al. 2019. Variational mixture-of-experts autoencoders for multi-modal deep generative models. Advances in neural information processing systems 32 (2019).

[36]

Yu Takagi and Shinji Nishimoto. 2023. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14453--14463.

[37]

Mingxing Tan and V Le Quoc. [n. d.]. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, September 2020. arXiv preprint arXiv:1905.11946 ([n. d.]).

[38]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600--612.

Digital Library

[39]

Mike Wu and Noah Goodman. 2018. Multimodal generative models for scalable weakly-supervised learning. Advances in neural information processing systems 31 (2018).

[40]

Zhuojia Wu, Qi Zhang, Duoqian Miao, Kun Yi, Wei Fan, and Liang Hu. 2024. HyDiscGAN: A Hybrid Distributed cGAN for Audio-Visual Privacy Preservation in Multimodal Sentiment Analysis. arXiv preprint arXiv:2404.11938 (2024).

[41]

Weihao Xia, Raoul de Charette, Cengiz Oztireli, and Jing-Hao Xue. 2024. Dream: Visual decoding from reversing human visual system. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 8226--8235.

[42]

Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, and Fengbo Ren. 2020. Learning in the frequency domain. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1740--1749.

[43]

Ling Yang and Shenda Hong. 2022. Unsupervised time-series representation learning with iterative bilinear temporal-spectral fusion. In International Conference on Machine Learning. PMLR, 25038--25054.

[44]

Yanchao Yang and Stefano Soatto. 2020. Fda: Fourier domain adaptation for semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4085--4095.

[45]

Kun Yi, Qi Zhang, Longbing Cao, Shoujin Wang, Guodong Long, Liang Hu, Hui He, Zhendong Niu, Wei Fan, and Hui Xiong. 2023. A Survey on Deep Learning based Time Series Analysis with Frequency Transformation. arXiv:2302.02173 [cs.LG]

[46]

Kun Yi, Qi Zhang, Wei Fan, Hui He, Liang Hu, Pengyang Wang, Ning An, Longbing Cao, and Zhendong Niu. 2023. FourierGNN: Rethinking Multivariate Time Series Forecasting from a Pure Graph Perspective. In Advances in Neural Information Processing Systems 36 (2024).

[47]

Kun Yi, Qi Zhang, Wei Fan, Shoujin Wang, Pengyang Wang, Hui He, Ning An, Defu Lian, Longbing Cao, and Zhendong Niu. 2024. Frequency-domain MLPs are more effective learners in time series forecasting. Advances in Neural Information Processing Systems 36 (2024).

[48]

Yu Zhang, Qi Zhang, Zixuan Gong, Yiwei Shi, Yepeng Liu, Duoqian Miao, Yang Liu, Ke Liu, Kun Yi, Wei Fan, et al. 2024. MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization. arXiv preprint arXiv:2406.01460 (2024).

[49]

Shuxian Zou, Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2022. Crossmodal cloze task: A new task to brain-to-word decoding. In Findings of the Association for Computational Linguistics: ACL 2022. 648--657.

Index Terms

Lite-Mind: Towards Efficient and Robust Brain Representation Learning
1. Computing methodologies
  1. Artificial intelligence
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Brain processing of vocal sounds in advertising: A functional magnetic resonance imaging (fMRI) study

Using functional magnetic resonance imaging (fMRI), this study aimed at investigating the neural mechanisms associated with human and non-human sounds' perception in advertising. The study employed a block design paradigm in which participants heard ...
Beyond topographic representation: Decoding visuospatial attention from local activity patterns in the human frontal cortex

The ability to detect where a person is attending is fundamental for brain-computer-interfaces. We explore how the attentional focus can be decoded from brain signals noninvasively acquired with functional magnetic resonance imaging (fMRI). Several ...
Visual Representation Model for fMRI-based Brain Decoding
ICECC '19: Proceedings of the 2019 2nd International Conference on Electronics, Communications and Control Engineering

Classification of brain activity patterns has enabled to infer what a person in mind. Among brain activity signals, functional Magnetic Resonance Imaging (fMRI) is considered as the most reliable source to decode the neural activity patterns. While fMRI-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
56
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)14

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents