Abstract
This paper concerns a new dataset we are collecting combining 3 modalities (EEG, video of the face, and audio) during imagined and vocalized phonemic and single-word prompts. We pre-process the EEG data, compute features for all 3 modalities, and perform binary classification of phonological categories using a combination of these modalities. For example, a deep-belief network obtains accuracies over 90 % on identifying consonants, which is significantly more accurate than two baseline support vector machines. These data may be used generally by the research community to learn multimodal relationships, and to develop silent-speech and brain-computer interfaces.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bartels, J., Andreasen, D., Ehirim, P., Mao, H., Seibert, S., Wright, E.J., Kennedy, P.: Neurotrophic electrode: method of assembly and implantation into human motor speech cortex. J. Neurosci. Methods 174(2), 168–176 (2008). http://www.sciencedirect.com/science/article/pii/S0165027008003865
Blakely, T., Miller, K., Rao, R.P.N., Holmes, M.D., Ojemann, J.: Localization and classification of phonemes using high spatial resolution electrocorticography (ECoG) grids. In: 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2008, pp. 4964–4967, August 2008
Brigham, K., Kumar, B.: Imagined speech classification with EEG signals for silent communication: a preliminary investigation into synthetic telepathy. In: 2010 4th International Conference on Bioinformatics and Biomedical Engineering (iCBBE), pp. 1–4, June 2010
Callan, D.E., Callan, A.M., Honda, K., Masaki, S.: Single-sweep EEG analysis of neural processes underlying perception and production of vowels. Cognit. Brain Res. 10(1–2), 173–176 (2000). http://www.sciencedirect.com/science/article/pii/S0926641000000252
DaSalla, C.S., Kambara, H., Sato, M., Koike, Y.: Single-trial classification of vowel speech imagery using common spatial patterns. Neural Netw. 22(9), 1334–1339 (2009). http://www.sciencedirect.com/science/article/pii/S0893608009000999, brain-Machine Interface
Delorme, A., Makeig, S.: EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134(1), 9–21 (2004). http://www.sciencedirect.com/science/article/pii/S0165027003003479
D’Zmura, M., Deng, S., Lappas, T., Thorpe, S., Srinivasan, R.: Toward EEG sensing of imagined speech. In: Jacko, J.A. (ed.) HCI International 2009, Part I. LNCS, vol. 5610, pp. 40–48. Springer, Heidelberg (2009)
Fujimaki, N., Takeuchi, F., Kobayashi, T., Kuriki, S., Hasuo, S.: Event-related potentials in silent speech. Brain Topogr. 6(4), 259–267 (1994)
Gomez-Herrero, G., De Clercq, W., Anwar, H., Kara, O., Egiazarian, K., Van Huffel, S., Van Paesschen, W.: Automatic removal of ocular artifacts in the EEG without an EOG reference channel. In: Proceedings of the 7th Nordic Signal Processing Symposium, NORSIG 2006, pp. 130–133, June 2006
Kellis, S., Miller, K., Thomson, K., Brown, R., House, P., Greger, B.: Decoding spoken words using local field potentials recorded from the cortical surface. J. Neural Eng. 7(5), 1–10 (2010). http://stacks.iop.org/1741-2552/7/i=5/a=056007
Kent, R.D., Weismer, G., Kent, J.F., Rosenbek, J.C.: Toward phonetic intelligibility testing in dysarthria. J. Speech Hear. Disord, 54(4), 482–499 (1989). http://dx.doi.org/10.1044/jshd.5404.482
Pasley, B.N., David, S.V., Mesgarani, N., Flinker, A., Shamma, S.A., Crone, N.E., Knight, R.T., Chang, E.F.: Reconstructing speech from human auditory cortex. PLoS ONE 10(1), 1–13 (2012)
Porbadnigk, A., Wester, M., Calliess, J., Schultz, T.: EEG-based speech recognition- impact of temporal effects. In: Encarnao, P., Veloso, A. (eds.) BIOSIGNALS, pp. 376–381. INSTICC Press (2009)
Sharbrough, F., Chatrian, G., Lesser, R., Lüders, H., Nuwer, M., Picton, T.: American electroencephalographic society guidelines for standard electrode position nomenclature. J. Clin. Neurophysiol. 8(2), 200–202 (1991)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html
Suppes, P., Lu, Z.L., Han, B.: Brain wave recognition of words. Proc. Nat. Acad. Sci. 94(26), 14965–14969 (1997). http://www.pnas.org/content/94/26/14965.abstract
Zhao, S., Rudzicz, F.: Classifying phonological categories in imagined and articulated speech. In: Proceedings of ICASSP 2015 (2015)
Acknowledgements
This research is funded by the Toronto Rehabilitation Institute, the Natural Sciences and Engineering Research Council of Canada (RGPIN 435874), and a grant from the Nuance Foundation. Data collection was assisted by Selvana Morcos, Aaron Marquis, Chaim Katz, and César Márquez-Chin.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Zhao, S., Rudzicz, F. (2016). Combining Different Modalities in Classifying Phonological Categories. In: Rish, I., Langs, G., Wehbe, L., Cecchi, G., Chang, Km., Murphy, B. (eds) Machine Learning and Interpretation in Neuroimaging. MLINI MLINI 2013 2014. Lecture Notes in Computer Science(), vol 9444. Springer, Cham. https://doi.org/10.1007/978-3-319-45174-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-45174-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45173-2
Online ISBN: 978-3-319-45174-9
eBook Packages: Computer ScienceComputer Science (R0)