Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Estimating Learning Effects: A Short-Time Fourier Transform Regression Model for MEG Source Localization

  • Conference paper
  • First Online:
Machine Learning and Interpretation in Neuroimaging (MLINI 2013, MLINI 2014)

Abstract

Magnetoencephalography (MEG) has a high temporal resolution well-suited for studying perceptual learning. However, to identify where learning happens in the brain, one needs to apply source localization techniques to project MEG sensor data into brain space. Previous source localization methods, such as the short-time Fourier transform (STFT) method by Gramfort et al. [6] produced intriguing results, but they were not designed to incorporate trial-by-trial learning effects. Here we modify the approach in [6] to produce an STFT-based source localization method (STFT-R) that includes an additional regression of the STFT components on covariates such as the behavioral learning curve. We also exploit a hierarchical \(L_{21}\) penalty to induce structured sparsity of STFT components and to emphasize signals from regions of interest (ROIs) that are selected according to prior knowledge. In reconstructing the ROI source signals from simulated data, STFT-R achieved smaller errors than a two-step method using the popular minimum-norm estimate (MNE), and in a real-world human learning experiment, STFT-R yielded more interpretable results about what time-frequency components of the ROI signals were correlated with learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Note that in this case, the variance of sensor noise in each trial was proportional to the source signals. This violated the i.i.d sensor noise assumption in both STFT-R and MNE-R. We compared performance of the two methods in tolerating such heteroskedasticity.

References

  1. Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization withsparsity-inducing penalties. CoRR abs/1108.0775 (2011). http://arXiv.org/abs/1108.0775

  2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Dale, A.M., Liu, A.K., Fischl, B.R., Buckner, R.L., Belliveau, J.W., Lewine, J.D., Halgren, E.: Dynamic statistical parametric mapping: combining fmri and meg for high-resolution imaging of cortical activity. Neuron 26(1), 55–67 (2000)

    Article  Google Scholar 

  4. Galka, A., Ozaki, O.Y.T., Biscay, R., Valdes-Sosa, P.: A solution to the dynamical inverse problem of eeg generation using spatiotemporal kalman filtering. NeuroImage 23, 435–453 (2004)

    Article  Google Scholar 

  5. Gauthier, I., Tarr, M.J., Moylan, J., Skudlarski, P., Gore, J.C., Anderson, A.W.: The fusiform face area is part of a network that processes faces at the individual level. J. Cogn. Neurosci. 12(3), 495–504 (2000)

    Article  Google Scholar 

  6. Gramfort, A., Strohmeier, D., Haueisen, J., Hamalainen, M., Kowalski, M.: Time-frequency mixed-norm estimates: sparse M/EEG imaging with non-stationary source activations. NeuroImage 70, 410–422 (2013)

    Article  Google Scholar 

  7. Gramfort, A., Luessi, M., Larson, E., Engemann, D.A., Strohmeier, D., Brodbeck, C., Parkkonen, L., Hmlinen, M.S.: MNE software for processing MEG and EEG data. NeuroImage 86, 446–460 (2014)

    Google Scholar 

  8. Hamalainen, M., Ilmoniemi, R.: Interpreting magnetic fields of the brain: minimum norm estimates. Med. Biol. Eng. Comput. 32, 35–42 (1994)

    Article  Google Scholar 

  9. Hamalainen, M., Hari, R., Ilmoniemi, R.J., Knuutila, J., Lounasmaa, O.V.: Magnetoencephalography-theory, instrumentation, to noninvasive studies of the working human brain. Rev. Mod. Phys. 65, 414–487 (1993)

    Article  Google Scholar 

  10. Henson, R.N., Wakeman, D.G., Litvak, V., Friston, K.J.: A parametric empirical bayesian framework for the EEG/MEG inverse problem: generative models for multi-subject and multi-modal integration. Front. Hum. Neurosci. 5, 76 (2011)

    Article  Google Scholar 

  11. Jenatton, R., Mairal, J., Obozinski, G., Bach, F.: Proximal methods for hierarchical space coding. J. Mach. Learn. Res. 12, 2297–2334 (2011)

    MathSciNet  MATH  Google Scholar 

  12. Kanwisher, N., McDermott, J., Chun, M.M.: The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17(11), 4302–4311 (1997)

    Google Scholar 

  13. Lamus, C., Hamalainen, M.S., Temereanca, S., Brown, E.N., Purdon, P.L.: A spatiotemporal dynamic distributed solution to the MEG inverse problem. NeuroImage 63, 894–909 (2012)

    Article  Google Scholar 

  14. Mattout, J., Phillips, C., Penny, W.D., Rugg, M.D., Friston, K.J.: MEG source localization under multiple constraints: an extended Bayesian framework. NeuroImage 30(3), 753–767 (2006)

    Article  Google Scholar 

  15. Pascual-Marqui, R.: Standardized low resolution brain electromagnetic tomography (sLORETA): technical details. Methods Find. Exp. Clin. Pharmacol. 24, 5–12 (2002)

    Google Scholar 

  16. Pitcher, D., Walsh, V., Duchaine, B.: The role of the occipital face area in the cortical face perception network. Exp. Brain Res. 209(4), 481–493 (2011)

    Article  Google Scholar 

  17. Stine, R.A.: Bootstrp prediction intervals for regression. J. Am. Stat. Assoc. 80, 1026–1031 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  18. Tanaka, J.W., Curran, T., Porterfield, A.L., Collins, D.: Activation of preexisting and acquired face representations: the N250 event-related potential as an index of face familiarity. J. Cogn. Neurosci. 18(9), 1488–1497 (2006)

    Article  Google Scholar 

  19. Xu, Y.: Cortical spatiotemporal plasticity in visual category learning (doctoral dissertation) (2013)

    Google Scholar 

Download references

Acknowledgements

This work was funded by the Multi-Modal Neuroimaging Training Program (MNTP) fellowship from the NIH (5R90DA023420-08,5R90DA023420-09) and Richard King Mellon Foundation. We also thank Yang Xu and the MNE-python user group for their help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Yang .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Appendix 1

Short-Time Fourier Transform (STFT). Our approach builds on the STFT implemented by Gramfort et al. in [6]. Given a time series \( \varvec{U} = \{U(t), t = 1,\cdots ,T\}\), a time step \(\tau _0\) and a window size \(T_0\), we define the STFT as

$$\begin{aligned} \varPhi (\{U(t)\},\tau ,\omega _h) = \sum _{t=1}^T U(t) K(t-\tau ) e^{(-i \omega _h)} \end{aligned}$$
(6)

for \(\omega _h = 2\pi h/T_0, h = 0,1,\cdots , T_0/2\) and \(\tau = \tau _0, 2\tau _0, \cdots n_0 \tau _0 \), where \(K(t-\tau )\) is a window function centered at \(\tau \), and \(n_0 = T/\tau _0\). We concatenate STFT components at different time points and frequencies into a single vector in \( \varvec{V} \in \mathbb {C}^{s}\), where \(s = (T_0/2+1) \times n_0\). Following notations in [6], we also call the \(K(t-\tau ) e^{(-i \omega _h)}\) terms STFT dictionary functions, and use a matrix’s Hermitian transpose \(\varvec{\varPhi ^H}\) to denote them, i.e. \( (\varvec{U}^T)_{1\times T} = ({\varvec{V}}^T)_{1\times s} (\varvec{\varPhi ^H})_{s \times T}\).

1.2 Appendix 2

The Karush-Kuhn-Tucker Conditions. Here we derive the Karush-Kuhn-Tucker (KKT) conditions for the hierarchical \(L_{21}\) problem. Since the term \( f(\varvec{z}) = \frac{1}{2} \sum _{r=1}^q ||\varvec{M} ^{(r)} - \varvec{G}( \sum _{k=1}^p X_k^{(r)} \varvec{Z}_k ) \varvec{\varPhi ^H}||_F^2\) is essentially a sum of squared error of a linear problem, we can re-write it as \( f(\varvec{z}) = \frac{1}{2} || \varvec{b} - \varvec{A} \varvec{z} ||^2\), where \(\varvec{z}\) again is a vector concatenated by entries in \(\varvec{Z}\), \(\varvec{b}\) is a vector concatenated by \(\varvec{M}^{(1)}, \cdots , \varvec{M}^{(q)}\), and \( \varvec{A}\) is a linear operator, such that \(\varvec{A} \varvec{z}\) is the concatenated \(\varvec{G}( \sum _{k=1}^p X_k^{(r)} \varvec{Z}_k ) \varvec{\varPhi ^H}, r = 1,\cdots , q\). Note that although \(\varvec{z}\) is a complex vector, we can further reduce the problem into a real-valued problem by rearranging the real and imaginary parts of \(\varvec{z}\) and \( \varvec{A}\). Here for simplicity, we only derive the KKT conditions for the real case. Again we use \(\{g_1, \cdots , g_h, \cdots , g_N \}\) to denote our ordered hierarchical group set, and \(\lambda _h\) to denote the corresponding penalty for group \(g_h\). We also define diagonal matrices \(\varvec{D}_h\) such that

$$\begin{aligned} \varvec{D}_h(l,l) = \left\{ \begin{array}{l l} 1 &{}\text { if } l\in g_h \\ 0 &{} \text { otherwise } \end{array} \right. \forall h \end{aligned}$$

therefore, the non-zero elements of \(\varvec{D}_h \varvec{z}\) is equal to \(\varvec{z} |_{g_h}\). With the simplified notation, we re-cast the original problem into a standard formulation:

$$\begin{aligned} \min _{\varvec{z}} (\frac{1}{2}\Vert \varvec{b} - \varvec{A} \varvec{z} \Vert ^2_2 + \sum _h \lambda _h \Vert \varvec{D}_h \varvec{z} \Vert _2) \end{aligned}$$
(7)

To better describe the KKT conditions, we introduce some auxiliary variables, \(\varvec{u} = \varvec{A} \varvec{z}, \varvec{v}_h = \varvec{D}_h \varvec{z}\). Then (7) is equivalent to

$$\begin{aligned} \min _{\varvec{z}, \varvec{u} , \varvec{v}_h}&(\frac{1}{2}\Vert \varvec{b} - \varvec{u} \Vert ^2_2 + \sum _h \lambda _h \Vert \varvec{v}_h \Vert _2) \\ \text {such that }&\varvec{u} = \varvec{A} \varvec{z}, \quad \varvec{v}_h= \varvec{D}_h \varvec{z}, \forall h \end{aligned}$$

The corresponding Lagrange function is

$$\begin{aligned} L(\varvec{z}, \varvec{u} , \varvec{v}_h,\varvec{\mu },\varvec{\xi }_h ) = \frac{1}{2}\Vert \varvec{b} - \varvec{u} \Vert ^2_2 + \sum _h \lambda _h\Vert \varvec{v}_h \Vert _2 + \varvec{\mu }^T ( \varvec{A} \varvec{z} - \varvec{u}) + \sum _{h} \varvec{\xi }_h^T ( \varvec{D}_h \varvec{z} - \varvec{v}_h ) \end{aligned}$$

where \(\varvec{\mu }\) and \(\varvec{\xi }_h\)’s are Lagrange multipliers. At the optimum, the following KKT conditions hold

$$\begin{aligned} \frac{\partial {L}}{\partial {\varvec{u}}}&= \varvec{u} - \varvec{b} - \varvec{\mu } = 0 \end{aligned}$$
(8)
$$\begin{aligned} \frac{\partial {L}}{\partial {\varvec{z}}}&= \varvec{A}^T \varvec{\mu } + \sum _h \varvec{D}_h \varvec{\xi }_h = 0 \end{aligned}$$
(9)
$$\begin{aligned} \frac{\partial {L}}{\partial {\varvec{v}_h}}&= \lambda _h \partial { \Vert \varvec{v}_h \Vert _2} - \varvec{\xi }_h \ni 0, \forall h \end{aligned}$$
(10)

where \(\partial { \Vert \cdot \Vert _2}\) is the subgradient of the \(L_2\) norm. From (8) we have \( \varvec{\mu } = \varvec{u} - \varvec{b}\), then (9) becomes \( \varvec{A}^T (\varvec{u} - \varvec{b}) + \sum _h \varvec{D}_h \varvec{\xi }_h = 0 \). Plugging \(\varvec{u}= \varvec{Az}\) in, we can see that the first term \( \varvec{A}^T ( \varvec{u} - \varvec{b}) = \varvec{A}^T (\varvec{A} \varvec{z} - \varvec{b})\) is the gradient of \( f(\varvec{z}) = \frac{1}{2} \Vert \varvec{b} - \varvec{A} \varvec{z}\Vert _2^2\). For a solution \(\varvec{z}_0\), once we plug in \( \varvec{v}_h = \varvec{D}_h \varvec{z}_0 \), the KKT conditions become

$$\begin{aligned}&\nabla f(\varvec{z})_{\varvec{z} = \varvec{z}_0} + \sum _h \varvec{D}_h \varvec{\xi }_h = 0 \end{aligned}$$
(11)
$$\begin{aligned}&\lambda _h \partial { \Vert \varvec{D}_h \varvec{z}_0 \Vert _2} - \varvec{\xi }_h \ni 0, \forall h \end{aligned}$$
(12)

In (12), we have the following according to the definition of subgradients

$$\begin{aligned}&\varvec{\xi }_h = \lambda _h \frac{ \varvec{D}_h \varvec{z}_0 }{\Vert \varvec{D}_h \varvec{z}_0 \Vert _2} \text { if } \Vert \varvec{D}_h \varvec{z}_0 \Vert _2 > 0 \\&\Vert \varvec{\xi }_h\Vert _2 \le \lambda _h \text { if } \Vert \varvec{D}_h \varvec{z}_0 \Vert _2 = 0 \end{aligned}$$

Therefore we can determine whether (11) and (12) hold by solving the following problem.

$$\begin{aligned} \min _{\varvec{\xi }_h}&\frac{1}{2} \Vert \nabla f(\varvec{z})_{\varvec{z} = \varvec{z}_0} + \sum _h \varvec{D}_h \varvec{\xi }_h\Vert _2^2 \\ \text {subject to }&\varvec{\xi }_h = \lambda _h \frac{ \varvec{D}_h \varvec{z}_0 }{\Vert \varvec{D}_h \varvec{z}_0 \Vert _2} \text { if } \Vert \varvec{D}_h \varvec{z}_0 \Vert _2 > 0 \\&\Vert \varvec{\xi }_h\Vert _2 \le \lambda _h \text { if } \Vert \varvec{D}_h \varvec{z}_0 \Vert _2 = 0 \end{aligned}$$

which is a standard group lasso problem with no overlap. We can use coordinate-descent to solve it. We define \(\frac{1}{2} \Vert \nabla f(\varvec{z})_{\varvec{z} = \varvec{z}_0} + \sum _h \varvec{D}_h \varvec{\xi }_h\Vert _2^2\) at the optimum as a measure of violation of the KKT conditions.

Let \(f_{J}\) be the function f constrained on a set J. Because the gradient of f is linear, if \(\varvec{z_0}\) only has non-zero entries in J, then the entries of \(\nabla f( \varvec{z})\) in J are equal to \(\nabla f_{J} (\varvec{z} |_J)\) at \(\varvec{z} = \varvec{z}_0\). In addition, \(\varvec{\xi }_h\)’s are separate for each group. Therefore if \(\varvec{z}_0\) is an optimal solution to the problem constrained on J, then the KKT conditions are already met for entries in J (i.e. \( \left( \nabla f(\varvec{z})_{\varvec{z} = \varvec{z}_0} + \sum _h \varvec{D}_h \varvec{\xi }_h\right) |_{J} = 0\)); for \(g_h \not \subset J\), we use ( \(\frac{1}{2}\Vert \left( \nabla f(\varvec{z})_{\varvec{z} = \varvec{z}_0} + \sum _h \varvec{D}_h \varvec{\xi }_h\right) |_{g_h} \Vert ^2\)) at the optimum as a measurement of how much the elements in group \(g_h\) violate the KKT conditions, which is a criterion when we greedily add groups (see Algorithm 2).

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Yang, Y., Tarr, M.J., Kass, R.E. (2016). Estimating Learning Effects: A Short-Time Fourier Transform Regression Model for MEG Source Localization. In: Rish, I., Langs, G., Wehbe, L., Cecchi, G., Chang, Km., Murphy, B. (eds) Machine Learning and Interpretation in Neuroimaging. MLINI MLINI 2013 2014. Lecture Notes in Computer Science(), vol 9444. Springer, Cham. https://doi.org/10.1007/978-3-319-45174-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45174-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45173-2

  • Online ISBN: 978-3-319-45174-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics