My research focuses on the behavioral and neural processes underlying speech and language in human communication. I have a special interest for the multimodal integration of speech and other domain-general mechanisms required for effective linguistic interactions. Using electrophysiological (EEG) and neuroimaging techniques (fMRI), I am currently studying the visual processing of orofacial movements associated to speech sounds (lip-reading) and the contribution of speech motor system for cross-modal feedforward predictions.
Relevant visual information is available in speakers’ faces during face-to-face interactions that... more Relevant visual information is available in speakers’ faces during face-to-face interactions that improve speech perception. There is an ongoing debate, however, about how phonemes and their visual counterparts, visemes are mapped. An influential hypothesis claims that several phonemes can be mapped into a single visemic category (many-to-one phoneme-viseme mapping). In contrast, recent findings have challenged this view, reporting evidence for sub-visemic syllable discrimination. We aim to investigate whether Spanish words from the same visemic category can be identified or not. We designed a lip-reading task in which participants had to identify target words presented in silent video clips among 3 distractors differing in their visual resemblance from the target. Target words were identified above chance and significantly more than distractors from the same visemic category. Moreover, the error rate for distractors significantly decreased with decreasing visemic resemblance to the target. These results challenge the many-to-one phoneme-viseme mapping hypothesis.
Multimodal integration is crucial for human interaction, in particular for social communication, ... more Multimodal integration is crucial for human interaction, in particular for social communication, which relies on integrating information from various sensory modalities. Recently a third visual pathway specialized in social perception was proposed, which includes the right superior temporal sulcus (STS) playing a key role in processing socially relevant cues and high-level social perception. Importantly, it has also recently been proposed that the left STS contributes to audiovisual integration of speech processing. In this article, we propose that brain areas along the right STS that support multimodal integration for social perception and cognition can be considered homologs to those in the left, language-dominant hemisphere, sustaining multimodal integration of speech and semantic concepts fundamental for social communication. Emphasizing the significance of the left STS in multimodal integration and associated processes such as multimodal attention to socially relevant stimuli, we underscore its potential relevance in comprehending neurodevelopmental conditions characterized by challenges in social communication such as autism spectrum disorder (ASD). Further research into this left lateral processing stream holds the promise of enhancing our understanding of social communication in both typical development and ASD, which may lead to more effective interventions that could improve the quality of life for individuals with atypical neurodevelopment.
Multimodal imitation of actions, gestures and vocal production is a hallmark of the evolution of ... more Multimodal imitation of actions, gestures and vocal production is a hallmark of the evolution of human communication, as both, vocal learning and visualgestural imitation, were crucial factors that facilitated the evolution of speech and singing. Comparative evidence has revealed that humans are an odd case in this respect, as the case for multimodal imitation is barely documented in nonhuman animals. While there is evidence of vocal learning in birds and in mammals like bats, elephants and marine mammals, evidence in both domains, vocal and gestural, exists for two Psittacine birds (budgerigars and grey parrots) and cetaceans only. Moreover, it draws attention to the apparent absence of vocal imitation (with just a few cases reported for vocal fold control in an orangutan and a gorilla and a prolonged development of vocal plasticity in marmosets) and even for imitation of intransitive actions (not object related) in monkeys and apes in the wild. Even after training, the evidence for productive or "true imitation" (copy of a novel behavior, i.e., not pre-existent in the observer's behavioral repertoire) in both domains is scarce. Here we review the evidence of multimodal imitation in cetaceans, one of the few living mammalian species that have been reported to display multimodal imitative learning besides humans, and their role in sociality, communication and group cultures. We propose that cetacean multimodal imitation was acquired in parallel with the evolution and development of behavioral synchrony and multimodal organization of sensorimotor information, supporting volitional motor control of their vocal system and audio-echoic-visual voices, body posture and movement integration.
While influential works since the 1970s have widely assumed that imitation is an innate skill in ... more While influential works since the 1970s have widely assumed that imitation is an innate skill in both human and non-human primate neonates, recent empirical studies and meta-analyses have challenged this view, indicating other forms of reward-based learning as relevant factors in the development of social behavior. The visual input translation into matching motor output that underlies imitation abilities instead seems to develop along with social interactions and sensorimotor experience during infancy and childhood. Recently, a new visual stream has been identified in both human and non-human primate brains, updating the dual visual stream model. This third pathway is thought to be specialized for dynamics aspects of social perceptions such as eyegaze, facial expression and crucially for audiovisual integration of speech. Here, we review empirical studies addressing an understudied but crucial aspect of speech and communication, namely the processing of visual orofacial cues (i.e., the perception of a speaker's lips and tongue movements) and its integration with vocal auditory cues. Along this review, we offer new insights from our understanding of speech as the product of evolution and development of a rhythmic and multimodal organization of sensorimotor brain networks, supporting volitional motor control of the upper vocal tract and audiovisual voices-faces integration.
In recent years, there have been important additions to the classical model of speech processing ... more In recent years, there have been important additions to the classical model of speech processing as originally depicted by the Broca-Wernicke model consisting of an anterior, productive region and a posterior, perceptive region, both connected via the arcuate fasciculus. The modern view implies a separation into a dorsal and a ventral pathway conveying different kinds of linguistic information, which parallels the organization of the visual system. Furthermore, this organization is highly conserved in evolution and can be seen as the neural scaffolding from which the speech networks originated. In this chapter we emphasize that the speech networks are embedded in a multimodal system encompassing audio-vocal and visuo-vocal connections, which can be referred to an ancestral audio-visuo-motor pathway present in nonhuman primates. Likewise, we propose a trimodal repertoire for speech processing and acquisition involving auditory, visual and motor representations of the basic elements of speech: phoneme, observation of mouth movements, and articulatory processes. Finally, we discuss this proposal in the context of a scenario for early speech acquisition in infants and in human evolution.
The human brain generates predictions about future events. During face-to-face conversations, vis... more The human brain generates predictions about future events. During face-to-face conversations, visemic information is used to predict upcoming auditory input. Recent studies suggest that the speech motor system plays a role in these cross-modal predictions, however, usually only audiovisual paradigms are employed. Here we tested whether speech sounds can be predicted on the basis of visemic information only, and to what extent interfering with orofacial articulatory effectors can affect these predictions. We registered EEG and employed N400 as an index of such predictions. Our results show that N400's amplitude was strongly modulated by visemic salience, coherent with cross-modal speech predictions. Additionally, N400 ceased to be evoked when syllables' visemes were presented backwards, suggesting that predictions occur only when the observed viseme matched an existing articuleme in the observer's speech motor system (i.e., the articulatory neural sequence required to produce a particular phoneme/viseme). Importantly, we found that interfering with the motor articulatory system strongly disrupted cross-modal predictions. We also observed a late P1000 that was evoked only for syllable-related visual stimuli, but whose amplitude was not modulated by interfering with the motor system. The present study provides further evidence of the importance of the speech production system for speech sounds predictions based on visemic information at the pre-lexical level. The implications of these results are discussed in the context of a hypothesized trimodal repertoire for speech, in which speech perception is conceived as a highly interactive process that involves not only your ears but also your eyes, lips and tongue.
In recent years, there have been important additions to the classical model of speech processing ... more In recent years, there have been important additions to the classical model of speech processing as originally depicted by the Broca–Wernicke model consisting of an anterior, productive region and a posterior, perceptive region, both connected via the arcuate fasciculus. The modern view implies a separation into a dorsal and a ventral pathway conveying different kinds of linguistic information, which parallels the organization of the visual system. Furthermore, this organization is highly conserved in evolution and can be seen as the neural scaffolding from which the speech networks originated. In this chapter we emphasize that the speech networks are embedded in a multimodal system encompassing audio-vocal and visuo-vocal connections, which can be referred to an ancestral audio-visuo-motor pathway present in nonhuman primates. Likewise, we propose a trimodal repertoire for speech processing and acquisition involving auditory, visual and motor representations of the basic elements of speech: phoneme, observation of mouth movements, and articulatory processes. Finally, we discuss this proposal in the context of a scenario for early speech acquisition in infants and in human evolution.
Studies on bilingual word reading and translation have examined the effects of lexical variables ... more Studies on bilingual word reading and translation have examined the effects of lexical variables (e.g., concreteness, cognate status) by comparing groups of non-translators with varying levels of L2 proficiency. However, little attention has been paid to another relevant factor: translation expertise (TI). To explore this issue, we administered word reading and translation tasks to two groups of non-translators possessing different levels of informal TI (Experiment 1), and to three groups of bilinguals possessing different levels of translation training (Experiment 2). Reaction-time recordings showed that in all groups reading was faster than translation and unaffected by concreteness and cognate effects. Conversely, in both experiments, all groups translated concrete and cognate words faster than abstract and non-cognate words, respectively. Notably, an advantage of backward over forward translation was observed only for low-proficiency non-translators (in Experiment 1). Also, in Experiment 2, the modifications induced by translation expertise were more marked in the early than in the late stages of training and practice. The results suggest that TI contributes to modulating inter-equivalent connections in bilingual memory.
Relevant visual information is available in speakers’ faces during face-to-face interactions that... more Relevant visual information is available in speakers’ faces during face-to-face interactions that improve speech perception. There is an ongoing debate, however, about how phonemes and their visual counterparts, visemes are mapped. An influential hypothesis claims that several phonemes can be mapped into a single visemic category (many-to-one phoneme-viseme mapping). In contrast, recent findings have challenged this view, reporting evidence for sub-visemic syllable discrimination. We aim to investigate whether Spanish words from the same visemic category can be identified or not. We designed a lip-reading task in which participants had to identify target words presented in silent video clips among 3 distractors differing in their visual resemblance from the target. Target words were identified above chance and significantly more than distractors from the same visemic category. Moreover, the error rate for distractors significantly decreased with decreasing visemic resemblance to the target. These results challenge the many-to-one phoneme-viseme mapping hypothesis.
Multimodal integration is crucial for human interaction, in particular for social communication, ... more Multimodal integration is crucial for human interaction, in particular for social communication, which relies on integrating information from various sensory modalities. Recently a third visual pathway specialized in social perception was proposed, which includes the right superior temporal sulcus (STS) playing a key role in processing socially relevant cues and high-level social perception. Importantly, it has also recently been proposed that the left STS contributes to audiovisual integration of speech processing. In this article, we propose that brain areas along the right STS that support multimodal integration for social perception and cognition can be considered homologs to those in the left, language-dominant hemisphere, sustaining multimodal integration of speech and semantic concepts fundamental for social communication. Emphasizing the significance of the left STS in multimodal integration and associated processes such as multimodal attention to socially relevant stimuli, we underscore its potential relevance in comprehending neurodevelopmental conditions characterized by challenges in social communication such as autism spectrum disorder (ASD). Further research into this left lateral processing stream holds the promise of enhancing our understanding of social communication in both typical development and ASD, which may lead to more effective interventions that could improve the quality of life for individuals with atypical neurodevelopment.
Multimodal imitation of actions, gestures and vocal production is a hallmark of the evolution of ... more Multimodal imitation of actions, gestures and vocal production is a hallmark of the evolution of human communication, as both, vocal learning and visualgestural imitation, were crucial factors that facilitated the evolution of speech and singing. Comparative evidence has revealed that humans are an odd case in this respect, as the case for multimodal imitation is barely documented in nonhuman animals. While there is evidence of vocal learning in birds and in mammals like bats, elephants and marine mammals, evidence in both domains, vocal and gestural, exists for two Psittacine birds (budgerigars and grey parrots) and cetaceans only. Moreover, it draws attention to the apparent absence of vocal imitation (with just a few cases reported for vocal fold control in an orangutan and a gorilla and a prolonged development of vocal plasticity in marmosets) and even for imitation of intransitive actions (not object related) in monkeys and apes in the wild. Even after training, the evidence for productive or "true imitation" (copy of a novel behavior, i.e., not pre-existent in the observer's behavioral repertoire) in both domains is scarce. Here we review the evidence of multimodal imitation in cetaceans, one of the few living mammalian species that have been reported to display multimodal imitative learning besides humans, and their role in sociality, communication and group cultures. We propose that cetacean multimodal imitation was acquired in parallel with the evolution and development of behavioral synchrony and multimodal organization of sensorimotor information, supporting volitional motor control of their vocal system and audio-echoic-visual voices, body posture and movement integration.
While influential works since the 1970s have widely assumed that imitation is an innate skill in ... more While influential works since the 1970s have widely assumed that imitation is an innate skill in both human and non-human primate neonates, recent empirical studies and meta-analyses have challenged this view, indicating other forms of reward-based learning as relevant factors in the development of social behavior. The visual input translation into matching motor output that underlies imitation abilities instead seems to develop along with social interactions and sensorimotor experience during infancy and childhood. Recently, a new visual stream has been identified in both human and non-human primate brains, updating the dual visual stream model. This third pathway is thought to be specialized for dynamics aspects of social perceptions such as eyegaze, facial expression and crucially for audiovisual integration of speech. Here, we review empirical studies addressing an understudied but crucial aspect of speech and communication, namely the processing of visual orofacial cues (i.e., the perception of a speaker's lips and tongue movements) and its integration with vocal auditory cues. Along this review, we offer new insights from our understanding of speech as the product of evolution and development of a rhythmic and multimodal organization of sensorimotor brain networks, supporting volitional motor control of the upper vocal tract and audiovisual voices-faces integration.
In recent years, there have been important additions to the classical model of speech processing ... more In recent years, there have been important additions to the classical model of speech processing as originally depicted by the Broca-Wernicke model consisting of an anterior, productive region and a posterior, perceptive region, both connected via the arcuate fasciculus. The modern view implies a separation into a dorsal and a ventral pathway conveying different kinds of linguistic information, which parallels the organization of the visual system. Furthermore, this organization is highly conserved in evolution and can be seen as the neural scaffolding from which the speech networks originated. In this chapter we emphasize that the speech networks are embedded in a multimodal system encompassing audio-vocal and visuo-vocal connections, which can be referred to an ancestral audio-visuo-motor pathway present in nonhuman primates. Likewise, we propose a trimodal repertoire for speech processing and acquisition involving auditory, visual and motor representations of the basic elements of speech: phoneme, observation of mouth movements, and articulatory processes. Finally, we discuss this proposal in the context of a scenario for early speech acquisition in infants and in human evolution.
The human brain generates predictions about future events. During face-to-face conversations, vis... more The human brain generates predictions about future events. During face-to-face conversations, visemic information is used to predict upcoming auditory input. Recent studies suggest that the speech motor system plays a role in these cross-modal predictions, however, usually only audiovisual paradigms are employed. Here we tested whether speech sounds can be predicted on the basis of visemic information only, and to what extent interfering with orofacial articulatory effectors can affect these predictions. We registered EEG and employed N400 as an index of such predictions. Our results show that N400's amplitude was strongly modulated by visemic salience, coherent with cross-modal speech predictions. Additionally, N400 ceased to be evoked when syllables' visemes were presented backwards, suggesting that predictions occur only when the observed viseme matched an existing articuleme in the observer's speech motor system (i.e., the articulatory neural sequence required to produce a particular phoneme/viseme). Importantly, we found that interfering with the motor articulatory system strongly disrupted cross-modal predictions. We also observed a late P1000 that was evoked only for syllable-related visual stimuli, but whose amplitude was not modulated by interfering with the motor system. The present study provides further evidence of the importance of the speech production system for speech sounds predictions based on visemic information at the pre-lexical level. The implications of these results are discussed in the context of a hypothesized trimodal repertoire for speech, in which speech perception is conceived as a highly interactive process that involves not only your ears but also your eyes, lips and tongue.
In recent years, there have been important additions to the classical model of speech processing ... more In recent years, there have been important additions to the classical model of speech processing as originally depicted by the Broca–Wernicke model consisting of an anterior, productive region and a posterior, perceptive region, both connected via the arcuate fasciculus. The modern view implies a separation into a dorsal and a ventral pathway conveying different kinds of linguistic information, which parallels the organization of the visual system. Furthermore, this organization is highly conserved in evolution and can be seen as the neural scaffolding from which the speech networks originated. In this chapter we emphasize that the speech networks are embedded in a multimodal system encompassing audio-vocal and visuo-vocal connections, which can be referred to an ancestral audio-visuo-motor pathway present in nonhuman primates. Likewise, we propose a trimodal repertoire for speech processing and acquisition involving auditory, visual and motor representations of the basic elements of speech: phoneme, observation of mouth movements, and articulatory processes. Finally, we discuss this proposal in the context of a scenario for early speech acquisition in infants and in human evolution.
Studies on bilingual word reading and translation have examined the effects of lexical variables ... more Studies on bilingual word reading and translation have examined the effects of lexical variables (e.g., concreteness, cognate status) by comparing groups of non-translators with varying levels of L2 proficiency. However, little attention has been paid to another relevant factor: translation expertise (TI). To explore this issue, we administered word reading and translation tasks to two groups of non-translators possessing different levels of informal TI (Experiment 1), and to three groups of bilinguals possessing different levels of translation training (Experiment 2). Reaction-time recordings showed that in all groups reading was faster than translation and unaffected by concreteness and cognate effects. Conversely, in both experiments, all groups translated concrete and cognate words faster than abstract and non-cognate words, respectively. Notably, an advantage of backward over forward translation was observed only for low-proficiency non-translators (in Experiment 1). Also, in Experiment 2, the modifications induced by translation expertise were more marked in the early than in the late stages of training and practice. The results suggest that TI contributes to modulating inter-equivalent connections in bilingual memory.
Uploads
Papers by Maëva Michon