Abstract
Magnetoencephalography (MEG) has a high temporal resolution well-suited for studying perceptual learning. However, to identify where learning happens in the brain, one needs to apply source localization techniques to project MEG sensor data into brain space. Previous source localization methods, such as the short-time Fourier transform (STFT) method by Gramfort et al. [6] produced intriguing results, but they were not designed to incorporate trial-by-trial learning effects. Here we modify the approach in [6] to produce an STFT-based source localization method (STFT-R) that includes an additional regression of the STFT components on covariates such as the behavioral learning curve. We also exploit a hierarchical \(L_{21}\) penalty to induce structured sparsity of STFT components and to emphasize signals from regions of interest (ROIs) that are selected according to prior knowledge. In reconstructing the ROI source signals from simulated data, STFT-R achieved smaller errors than a two-step method using the popular minimum-norm estimate (MNE), and in a real-world human learning experiment, STFT-R yielded more interpretable results about what time-frequency components of the ROI signals were correlated with learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that in this case, the variance of sensor noise in each trial was proportional to the source signals. This violated the i.i.d sensor noise assumption in both STFT-R and MNE-R. We compared performance of the two methods in tolerating such heteroskedasticity.
References
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization withsparsity-inducing penalties. CoRR abs/1108.0775 (2011). http://arXiv.org/abs/1108.0775
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Dale, A.M., Liu, A.K., Fischl, B.R., Buckner, R.L., Belliveau, J.W., Lewine, J.D., Halgren, E.: Dynamic statistical parametric mapping: combining fmri and meg for high-resolution imaging of cortical activity. Neuron 26(1), 55–67 (2000)
Galka, A., Ozaki, O.Y.T., Biscay, R., Valdes-Sosa, P.: A solution to the dynamical inverse problem of eeg generation using spatiotemporal kalman filtering. NeuroImage 23, 435–453 (2004)
Gauthier, I., Tarr, M.J., Moylan, J., Skudlarski, P., Gore, J.C., Anderson, A.W.: The fusiform face area is part of a network that processes faces at the individual level. J. Cogn. Neurosci. 12(3), 495–504 (2000)
Gramfort, A., Strohmeier, D., Haueisen, J., Hamalainen, M., Kowalski, M.: Time-frequency mixed-norm estimates: sparse M/EEG imaging with non-stationary source activations. NeuroImage 70, 410–422 (2013)
Gramfort, A., Luessi, M., Larson, E., Engemann, D.A., Strohmeier, D., Brodbeck, C., Parkkonen, L., Hmlinen, M.S.: MNE software for processing MEG and EEG data. NeuroImage 86, 446–460 (2014)
Hamalainen, M., Ilmoniemi, R.: Interpreting magnetic fields of the brain: minimum norm estimates. Med. Biol. Eng. Comput. 32, 35–42 (1994)
Hamalainen, M., Hari, R., Ilmoniemi, R.J., Knuutila, J., Lounasmaa, O.V.: Magnetoencephalography-theory, instrumentation, to noninvasive studies of the working human brain. Rev. Mod. Phys. 65, 414–487 (1993)
Henson, R.N., Wakeman, D.G., Litvak, V., Friston, K.J.: A parametric empirical bayesian framework for the EEG/MEG inverse problem: generative models for multi-subject and multi-modal integration. Front. Hum. Neurosci. 5, 76 (2011)
Jenatton, R., Mairal, J., Obozinski, G., Bach, F.: Proximal methods for hierarchical space coding. J. Mach. Learn. Res. 12, 2297–2334 (2011)
Kanwisher, N., McDermott, J., Chun, M.M.: The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17(11), 4302–4311 (1997)
Lamus, C., Hamalainen, M.S., Temereanca, S., Brown, E.N., Purdon, P.L.: A spatiotemporal dynamic distributed solution to the MEG inverse problem. NeuroImage 63, 894–909 (2012)
Mattout, J., Phillips, C., Penny, W.D., Rugg, M.D., Friston, K.J.: MEG source localization under multiple constraints: an extended Bayesian framework. NeuroImage 30(3), 753–767 (2006)
Pascual-Marqui, R.: Standardized low resolution brain electromagnetic tomography (sLORETA): technical details. Methods Find. Exp. Clin. Pharmacol. 24, 5–12 (2002)
Pitcher, D., Walsh, V., Duchaine, B.: The role of the occipital face area in the cortical face perception network. Exp. Brain Res. 209(4), 481–493 (2011)
Stine, R.A.: Bootstrp prediction intervals for regression. J. Am. Stat. Assoc. 80, 1026–1031 (1985)
Tanaka, J.W., Curran, T., Porterfield, A.L., Collins, D.: Activation of preexisting and acquired face representations: the N250 event-related potential as an index of face familiarity. J. Cogn. Neurosci. 18(9), 1488–1497 (2006)
Xu, Y.: Cortical spatiotemporal plasticity in visual category learning (doctoral dissertation) (2013)
Acknowledgements
This work was funded by the Multi-Modal Neuroimaging Training Program (MNTP) fellowship from the NIH (5R90DA023420-08,5R90DA023420-09) and Richard King Mellon Foundation. We also thank Yang Xu and the MNE-python user group for their help.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Appendix 1
Short-Time Fourier Transform (STFT). Our approach builds on the STFT implemented by Gramfort et al. in [6]. Given a time series \( \varvec{U} = \{U(t), t = 1,\cdots ,T\}\), a time step \(\tau _0\) and a window size \(T_0\), we define the STFT as
for \(\omega _h = 2\pi h/T_0, h = 0,1,\cdots , T_0/2\) and \(\tau = \tau _0, 2\tau _0, \cdots n_0 \tau _0 \), where \(K(t-\tau )\) is a window function centered at \(\tau \), and \(n_0 = T/\tau _0\). We concatenate STFT components at different time points and frequencies into a single vector in \( \varvec{V} \in \mathbb {C}^{s}\), where \(s = (T_0/2+1) \times n_0\). Following notations in [6], we also call the \(K(t-\tau ) e^{(-i \omega _h)}\) terms STFT dictionary functions, and use a matrix’s Hermitian transpose \(\varvec{\varPhi ^H}\) to denote them, i.e. \( (\varvec{U}^T)_{1\times T} = ({\varvec{V}}^T)_{1\times s} (\varvec{\varPhi ^H})_{s \times T}\).
1.2 Appendix 2
The Karush-Kuhn-Tucker Conditions. Here we derive the Karush-Kuhn-Tucker (KKT) conditions for the hierarchical \(L_{21}\) problem. Since the term \( f(\varvec{z}) = \frac{1}{2} \sum _{r=1}^q ||\varvec{M} ^{(r)} - \varvec{G}( \sum _{k=1}^p X_k^{(r)} \varvec{Z}_k ) \varvec{\varPhi ^H}||_F^2\) is essentially a sum of squared error of a linear problem, we can re-write it as \( f(\varvec{z}) = \frac{1}{2} || \varvec{b} - \varvec{A} \varvec{z} ||^2\), where \(\varvec{z}\) again is a vector concatenated by entries in \(\varvec{Z}\), \(\varvec{b}\) is a vector concatenated by \(\varvec{M}^{(1)}, \cdots , \varvec{M}^{(q)}\), and \( \varvec{A}\) is a linear operator, such that \(\varvec{A} \varvec{z}\) is the concatenated \(\varvec{G}( \sum _{k=1}^p X_k^{(r)} \varvec{Z}_k ) \varvec{\varPhi ^H}, r = 1,\cdots , q\). Note that although \(\varvec{z}\) is a complex vector, we can further reduce the problem into a real-valued problem by rearranging the real and imaginary parts of \(\varvec{z}\) and \( \varvec{A}\). Here for simplicity, we only derive the KKT conditions for the real case. Again we use \(\{g_1, \cdots , g_h, \cdots , g_N \}\) to denote our ordered hierarchical group set, and \(\lambda _h\) to denote the corresponding penalty for group \(g_h\). We also define diagonal matrices \(\varvec{D}_h\) such that
therefore, the non-zero elements of \(\varvec{D}_h \varvec{z}\) is equal to \(\varvec{z} |_{g_h}\). With the simplified notation, we re-cast the original problem into a standard formulation:
To better describe the KKT conditions, we introduce some auxiliary variables, \(\varvec{u} = \varvec{A} \varvec{z}, \varvec{v}_h = \varvec{D}_h \varvec{z}\). Then (7) is equivalent to
The corresponding Lagrange function is
where \(\varvec{\mu }\) and \(\varvec{\xi }_h\)’s are Lagrange multipliers. At the optimum, the following KKT conditions hold
where \(\partial { \Vert \cdot \Vert _2}\) is the subgradient of the \(L_2\) norm. From (8) we have \( \varvec{\mu } = \varvec{u} - \varvec{b}\), then (9) becomes \( \varvec{A}^T (\varvec{u} - \varvec{b}) + \sum _h \varvec{D}_h \varvec{\xi }_h = 0 \). Plugging \(\varvec{u}= \varvec{Az}\) in, we can see that the first term \( \varvec{A}^T ( \varvec{u} - \varvec{b}) = \varvec{A}^T (\varvec{A} \varvec{z} - \varvec{b})\) is the gradient of \( f(\varvec{z}) = \frac{1}{2} \Vert \varvec{b} - \varvec{A} \varvec{z}\Vert _2^2\). For a solution \(\varvec{z}_0\), once we plug in \( \varvec{v}_h = \varvec{D}_h \varvec{z}_0 \), the KKT conditions become
In (12), we have the following according to the definition of subgradients
Therefore we can determine whether (11) and (12) hold by solving the following problem.
which is a standard group lasso problem with no overlap. We can use coordinate-descent to solve it. We define \(\frac{1}{2} \Vert \nabla f(\varvec{z})_{\varvec{z} = \varvec{z}_0} + \sum _h \varvec{D}_h \varvec{\xi }_h\Vert _2^2\) at the optimum as a measure of violation of the KKT conditions.
Let \(f_{J}\) be the function f constrained on a set J. Because the gradient of f is linear, if \(\varvec{z_0}\) only has non-zero entries in J, then the entries of \(\nabla f( \varvec{z})\) in J are equal to \(\nabla f_{J} (\varvec{z} |_J)\) at \(\varvec{z} = \varvec{z}_0\). In addition, \(\varvec{\xi }_h\)’s are separate for each group. Therefore if \(\varvec{z}_0\) is an optimal solution to the problem constrained on J, then the KKT conditions are already met for entries in J (i.e. \( \left( \nabla f(\varvec{z})_{\varvec{z} = \varvec{z}_0} + \sum _h \varvec{D}_h \varvec{\xi }_h\right) |_{J} = 0\)); for \(g_h \not \subset J\), we use ( \(\frac{1}{2}\Vert \left( \nabla f(\varvec{z})_{\varvec{z} = \varvec{z}_0} + \sum _h \varvec{D}_h \varvec{\xi }_h\right) |_{g_h} \Vert ^2\)) at the optimum as a measurement of how much the elements in group \(g_h\) violate the KKT conditions, which is a criterion when we greedily add groups (see Algorithm 2).
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Yang, Y., Tarr, M.J., Kass, R.E. (2016). Estimating Learning Effects: A Short-Time Fourier Transform Regression Model for MEG Source Localization. In: Rish, I., Langs, G., Wehbe, L., Cecchi, G., Chang, Km., Murphy, B. (eds) Machine Learning and Interpretation in Neuroimaging. MLINI MLINI 2013 2014. Lecture Notes in Computer Science(), vol 9444. Springer, Cham. https://doi.org/10.1007/978-3-319-45174-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-45174-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45173-2
Online ISBN: 978-3-319-45174-9
eBook Packages: Computer ScienceComputer Science (R0)