I study auditory perception and memory using electrophysiological and behavioral measures. My main hypothesis is that auditory objects (units of information in auditory processing) are predictive models which are based on regularities extracted from the acoustic input.
We examine two dimensional mixture of single-component fermions and dipolar bosons. We calculate ... more We examine two dimensional mixture of single-component fermions and dipolar bosons. We calculate the self-enregies of the fermions in the normal state and the Cooper pair channel by including first order vertex correction to derive a modified Eliashberg equation. We predict appearance of superfluids with various non-standard pairing symmetries at experimentally feasible transition temperatures within the strong-coupling limit of the Eliashberg equation. Excitations in these superfluids are anyonic and follow non-Abelian statistics.
The auditory environment typically comprises several simultaneously active sound sources. In cont... more The auditory environment typically comprises several simultaneously active sound sources. In contrast to the perceptual segregation of two concurrent sounds, the perception of three simultaneous sound objects has not yet been studied systematically. We conducted two experiments in which participants were presented with complex sounds containing sound segregation cues (mistuning, onset asynchrony, differences in frequency or amplitude modulation or in sound location), which were set up to promote the perceptual organization of the tonal elements into one, two, or three concurrent sounds. In Experiment 1, listeners indicated whether they heard one, two, or three concurrent sounds. In Experiment 2, participants watched a silent subtitled movie while EEG was recorded to extract the object-related negativity (ORN) component of the event-related potential. Listeners predominantly reported hearing two sounds when the segregation promoting manipulations were applied to the same tonal element. When two different tonal elements received manipulations promoting them to be heard as separate auditory objects, participants reported hearing two and three concurrent sounds objects with equal probability. The ORN was elicited in most conditions; sounds that included the amplitude- or the frequency-modulation cue generated the smallest ORN amplitudes. Manipulating two different tonal elements yielded numerically and often significantly smaller ORNs than the sum of the ORNs elicited when the same cues were applied on a single tonal element. These results suggest that ORN reflects the presence of multiple concurrent sounds, but not their number. The ORN results are compatible with the horse-race principle of combining different cues of concurrent sound segregation.
The last decade has seen an explosion of research in auditory perception and cognition. This grow... more The last decade has seen an explosion of research in auditory perception and cognition. This growing activity encompasses neurophysiological research in nonhuman species, computational modeling of basic neurophysiological functions, and neuroimaging research in humans. Among the various neuroimaging techniques available, scalp recording of neuroelectric (electroencephalography [EEG]) and neuromagnetic (magnetoencephalography [MEG]) (see Nagarajan, Gabriel, and Herman, Chapter 5) brain activity have proven to be a formidable tool in the arsenal available to cognitive neuroscientists interested in understanding audition. These techniques measure the dynamic pattern of electromagnetic fields at the scalp produced by the coherent activity of large neuronal populations in the brain. In cognitive neuroscience, the measurement of the electrical event-related brain potentials (ERPs) or magnetic eventrelated fields (ERFs) is among the major noninvasive techniques used for investigating sensory and cognitive information processing and for testing specific assumptions of cognitive theories that are not easily amenable to behavioral techniques. After identifying and characterizing the ERP/ERF signals that accompany the basic steps of processing discrete events, scientific interest has gradually shifted toward specifying the complex processing of more realistic stimulus configurations. In the auditory modality, recent years have seen an upsurge of research papers investigating the processes of auditory scene analysis
ABSTRACT Exposure to atypical events during specific intrauterine periods may induce reprogrammin... more ABSTRACT Exposure to atypical events during specific intrauterine periods may induce reprogramming of fetal brain development and lead to postnatal problems in behavior, emotions, and cognition (Van den Bergh et al., 2005). We studied the association between prenatal exposure to maternal anxiety and infant auditory information processing. Seventy pregnant women filled out the State-Trait Anxiety Inventory at between their 9th and 15th week of pregnancy. Two and nine months after birth, we recorded auditory event-related potentials (AERPs) in their infants using an oddball paradigm with a standard tone and three types of deviants, i.e. inter-stimulus interval (ISI)-deviants, white noise, and novel sounds. AERP amplitudes measured from these windows were analyzed by means of repeated measures ANOVAs; maternal State Anxiety was entered as a continuous predictor variable. A consistent result across both ages was that significant positive associations were found between the level of maternal anxiety and infant’s responses to stimuli with low information contents (the standard and ISI-deviant) (p < .05) and that no associations were found for rare, more informative stimuli (white noise, novel sounds). Infants from highly anxious mothers over process sounds with low information. This difference between infants of high and low anxious pregnant mothers may underlie the higher negative reactivity and language problems observed in other studies in children prenatally exposed to high levels of maternal anxiety.
We report on the design and the collection of a multi-modal data corpus for cognitive acoustic sc... more We report on the design and the collection of a multi-modal data corpus for cognitive acoustic scene analysis. Sounds are generated by stationary and moving sources (people), that is by omni-directional speakers mounted on people's heads. One or two subjects walk along predetermined systematic and random paths, in synchrony and out of sync. Sound is captured in multiple microphone systems, including a four MEMS microphone directional array, two electret microphones sit uated in the ears of a stuffed gerbil head, and a Head Acoustics, head-shoulder unit with ICP microphones. Three micro-Doppler units operating at different frequencies were employed to capture gait and the articulatory signatures as well as location of the people in the scene. Three ground vibration sensors were recording the footsteps of the walking people. A 3D MESA camera as well as a web-cam provided 2D and 3D visual data for system calibration and ground truth. Data were collected in three environments ranging from a well controlled environment (anechoic chamber), an indoor environment (large classroom) and the natural environment of an outside courtyard. A software tool has been developed for the browsing and visualization of the data.
The human auditory system is capable of grouping sounds originating from different sound sources ... more The human auditory system is capable of grouping sounds originating from different sound sources into coherent auditory streams, a process termed auditory stream segregation. Several cues can influence auditory stream segregation, but the full set of cues and the way in which they are integrated is still unknown. In the current study, we tested whether auditory motion can serve as a cue for segregating sequences of tones. Our hypothesis was that, following the principle of common fate, sounds emitted by sources moving together in space along similar trajectories will be more likely to be grouped into a single auditory stream, while sounds emitted by independently moving sources will more often be heard as two streams. Stimuli were derived from sound recordings in which the sound source motion was induced by walking humans. Although the results showed a clear effect of spatial separation, auditory motion had a negligible influence on stream segregation. Hence, auditory motion may not be used as a primitive cue in auditory stream segregation.
Based on results showing that the ''deviant-minus-standard'' estimate of the mismatch negativity ... more Based on results showing that the ''deviant-minus-standard'' estimate of the mismatch negativity (MMN) amplitude increases with increasing amounts of deviance, it has been suggested that the MMN amplitude reflects the amount of difference between the neural representations of the standard and the deviant sound. However, the deviant-minusstandard waveform also includes an N1 difference. We tested the effects of the magnitude of deviance on MMN while minimizing this N1 confound. We found no significant magnitude of deviance effect on the genuine MMN amplitude. Thus we suggest that the average MMN amplitude does not reflect the difference between neural stimulus representations; rather it may index the percentage of detected deviants, each of which elicits an MMN response of uniform amplitude. These results are compatible with an explanation suggesting that MMN is involved in maintaining a neural representation of the auditory environment.
Though many studies suggest that fine acoustic details fade from memory after 15 s or even less, ... more Though many studies suggest that fine acoustic details fade from memory after 15 s or even less, everyday experience tells us that the voice of a person or a musical instrument can be recognized long after it was last heard. We wished to determine whether tones leave a lasting memory trace using an experimental model of implicit recognition and testing whether exact pitch information can be retrieved even after 30 s. Event-related brain potentials demonstrated the survival of an accurate representation of tone pitch in the auditory cortex. This result provides a link between short-duration buffering and permanent storage of acoustic information.
Infrequently omitting a sound from a repetitive sequence elicits the mismatch negativity (MMN) ER... more Infrequently omitting a sound from a repetitive sequence elicits the mismatch negativity (MMN) ERP response when the stimulus onset asynchrony (SOA) is less than 200 ms. We contrasted two alternative explanations of omission MMN. (1) Each sound starts a separate temporal integration process. Omissions violate the constancy of the temporal structure within the integration window. (2) Sounds preceding an omission are perceived to be louder than those followed by a sound within the integration period, because omissions allow the full stimulus aftereffect to be included in perceived loudness. We varied the SOA between 117 and 217 ms. For this case, the temporal structure explanation predicts that no MMN will be elicited, whereas the loudness summation explanation predicts that MMN will be elicited. MMN was elicited by tone omissions with random SOA, suggesting that loudness summation plays an important role in the elicitation of omission MMN.
The brain organizes sound into coherent sequences, termed auditory streams. We asked whether task... more The brain organizes sound into coherent sequences, termed auditory streams. We asked whether task-irrelevant sounds would be detected as separate auditory streams in a natural listening environment that included three simultaneously active sound sources. Participants watched a movie with sound while street-noise and sequences of naturally varying footstep sounds were presented in the background. Occasional deviations in the footstep sequences elicited the mismatch negativity (MMN) event-related potential. The elicitation of MMN showed that the regular features of the footstep sequences had been registered and their violations detected, which could only occur if the footstep sequence had been detected as a separate auditory stream. Our results demonstrate that sounds are organized into auditory streams irrespective of their relevance to ongoing behavior. NeuroReport 14 :2053^2056
An audiovisual experiment using moving sound sources was designed to investigate whether the anal... more An audiovisual experiment using moving sound sources was designed to investigate whether the analysis of auditory scenes is modulated by synchronous presentation of visual information. Listeners were presented with an alternating sequence of two pure tones delivered by two separate sound sources. In different conditions, the two sound sources were either stationary or moving on random trajectories around the listener. Both the sounds and the movement trajectories were derived from recordings in which two humans were moving with loudspeakers attached to their heads. Visualized movement trajectories modeled by a computer animation were presented together with the sounds. In the main experiment, behavioral reports on sound organization were collected from young healthy volunteers. The proportion and stability of the different sound organizations were compared between the conditions in which the visualized trajectories matched the movement of the sound sources and when the two were independent of each other. The results corroborate earlier findings that separation of sound sources in space promotes segregation. However, no additional effect of auditory movement per se on the perceptual organization of sounds was obtained. Surprisingly, the presentation of movement-congruent visual cues did not strengthen the effects of spatial separation on segregating auditory streams. Our findings are consistent with the view that bistability in the auditory modality can occur independently from other modalities.
Predictive accounts of perception have received increasing attention in the past 20 years. Detect... more Predictive accounts of perception have received increasing attention in the past 20 years. Detecting violations of auditory regularities, as reflected by the Mismatch Negativity (MMN) auditory event-related potential, is amongst the phenomena seamlessly fitting this approach. Largely based on the MMN literature, we propose a psychological conceptual framework called the Auditory Event Representation System (AERS), which is based on the assumption that auditory regularity violation detection and the formation of auditory perceptual objects are based on the same predictive regularity representations. Based on this notion, a computational model of auditory stream segregation, called CHAINS, has been developed. In CHAINS, the auditory sensory event representation of each incoming sound is considered for being the continuation of likely combinations of the preceding sounds in the sequence, thus providing alternative interpretations of the auditory input. Detecting repeating patterns allows predicting upcoming sound events, thus providing a test and potential support for the corresponding interpretation. Alternative interpretations continuously compete for perceptual dominance. In this paper, we briefly describe AERS and deduce some general constraints from this conceptual model. We then go on to illustrate how these constraints are computationally specified in CHAINS. Keywords Auditory object Á Auditory scene analysis Á Deviance detection Á Predictive modelling Á Mismatch negativity (MMN) This is one of several papers published together in Brain Topography in the ''Special Issue: Mismatch Negativity''.
Content in the AJP database is intended for personal, noncommercial use only. You may not reprodu... more Content in the AJP database is intended for personal, noncommercial use only. You may not reproduce, publish, distribute, transmit, participate in the transfer or sale of, modify, create derivative works from, display, or in any way exploit the AJP content in whole or in part without the written permission of the copyright holder.
We examine two dimensional mixture of single-component fermions and dipolar bosons. We calculate ... more We examine two dimensional mixture of single-component fermions and dipolar bosons. We calculate the self-enregies of the fermions in the normal state and the Cooper pair channel by including first order vertex correction to derive a modified Eliashberg equation. We predict appearance of superfluids with various non-standard pairing symmetries at experimentally feasible transition temperatures within the strong-coupling limit of the Eliashberg equation. Excitations in these superfluids are anyonic and follow non-Abelian statistics.
The auditory environment typically comprises several simultaneously active sound sources. In cont... more The auditory environment typically comprises several simultaneously active sound sources. In contrast to the perceptual segregation of two concurrent sounds, the perception of three simultaneous sound objects has not yet been studied systematically. We conducted two experiments in which participants were presented with complex sounds containing sound segregation cues (mistuning, onset asynchrony, differences in frequency or amplitude modulation or in sound location), which were set up to promote the perceptual organization of the tonal elements into one, two, or three concurrent sounds. In Experiment 1, listeners indicated whether they heard one, two, or three concurrent sounds. In Experiment 2, participants watched a silent subtitled movie while EEG was recorded to extract the object-related negativity (ORN) component of the event-related potential. Listeners predominantly reported hearing two sounds when the segregation promoting manipulations were applied to the same tonal element. When two different tonal elements received manipulations promoting them to be heard as separate auditory objects, participants reported hearing two and three concurrent sounds objects with equal probability. The ORN was elicited in most conditions; sounds that included the amplitude- or the frequency-modulation cue generated the smallest ORN amplitudes. Manipulating two different tonal elements yielded numerically and often significantly smaller ORNs than the sum of the ORNs elicited when the same cues were applied on a single tonal element. These results suggest that ORN reflects the presence of multiple concurrent sounds, but not their number. The ORN results are compatible with the horse-race principle of combining different cues of concurrent sound segregation.
The last decade has seen an explosion of research in auditory perception and cognition. This grow... more The last decade has seen an explosion of research in auditory perception and cognition. This growing activity encompasses neurophysiological research in nonhuman species, computational modeling of basic neurophysiological functions, and neuroimaging research in humans. Among the various neuroimaging techniques available, scalp recording of neuroelectric (electroencephalography [EEG]) and neuromagnetic (magnetoencephalography [MEG]) (see Nagarajan, Gabriel, and Herman, Chapter 5) brain activity have proven to be a formidable tool in the arsenal available to cognitive neuroscientists interested in understanding audition. These techniques measure the dynamic pattern of electromagnetic fields at the scalp produced by the coherent activity of large neuronal populations in the brain. In cognitive neuroscience, the measurement of the electrical event-related brain potentials (ERPs) or magnetic eventrelated fields (ERFs) is among the major noninvasive techniques used for investigating sensory and cognitive information processing and for testing specific assumptions of cognitive theories that are not easily amenable to behavioral techniques. After identifying and characterizing the ERP/ERF signals that accompany the basic steps of processing discrete events, scientific interest has gradually shifted toward specifying the complex processing of more realistic stimulus configurations. In the auditory modality, recent years have seen an upsurge of research papers investigating the processes of auditory scene analysis
ABSTRACT Exposure to atypical events during specific intrauterine periods may induce reprogrammin... more ABSTRACT Exposure to atypical events during specific intrauterine periods may induce reprogramming of fetal brain development and lead to postnatal problems in behavior, emotions, and cognition (Van den Bergh et al., 2005). We studied the association between prenatal exposure to maternal anxiety and infant auditory information processing. Seventy pregnant women filled out the State-Trait Anxiety Inventory at between their 9th and 15th week of pregnancy. Two and nine months after birth, we recorded auditory event-related potentials (AERPs) in their infants using an oddball paradigm with a standard tone and three types of deviants, i.e. inter-stimulus interval (ISI)-deviants, white noise, and novel sounds. AERP amplitudes measured from these windows were analyzed by means of repeated measures ANOVAs; maternal State Anxiety was entered as a continuous predictor variable. A consistent result across both ages was that significant positive associations were found between the level of maternal anxiety and infant’s responses to stimuli with low information contents (the standard and ISI-deviant) (p < .05) and that no associations were found for rare, more informative stimuli (white noise, novel sounds). Infants from highly anxious mothers over process sounds with low information. This difference between infants of high and low anxious pregnant mothers may underlie the higher negative reactivity and language problems observed in other studies in children prenatally exposed to high levels of maternal anxiety.
We report on the design and the collection of a multi-modal data corpus for cognitive acoustic sc... more We report on the design and the collection of a multi-modal data corpus for cognitive acoustic scene analysis. Sounds are generated by stationary and moving sources (people), that is by omni-directional speakers mounted on people's heads. One or two subjects walk along predetermined systematic and random paths, in synchrony and out of sync. Sound is captured in multiple microphone systems, including a four MEMS microphone directional array, two electret microphones sit uated in the ears of a stuffed gerbil head, and a Head Acoustics, head-shoulder unit with ICP microphones. Three micro-Doppler units operating at different frequencies were employed to capture gait and the articulatory signatures as well as location of the people in the scene. Three ground vibration sensors were recording the footsteps of the walking people. A 3D MESA camera as well as a web-cam provided 2D and 3D visual data for system calibration and ground truth. Data were collected in three environments ranging from a well controlled environment (anechoic chamber), an indoor environment (large classroom) and the natural environment of an outside courtyard. A software tool has been developed for the browsing and visualization of the data.
The human auditory system is capable of grouping sounds originating from different sound sources ... more The human auditory system is capable of grouping sounds originating from different sound sources into coherent auditory streams, a process termed auditory stream segregation. Several cues can influence auditory stream segregation, but the full set of cues and the way in which they are integrated is still unknown. In the current study, we tested whether auditory motion can serve as a cue for segregating sequences of tones. Our hypothesis was that, following the principle of common fate, sounds emitted by sources moving together in space along similar trajectories will be more likely to be grouped into a single auditory stream, while sounds emitted by independently moving sources will more often be heard as two streams. Stimuli were derived from sound recordings in which the sound source motion was induced by walking humans. Although the results showed a clear effect of spatial separation, auditory motion had a negligible influence on stream segregation. Hence, auditory motion may not be used as a primitive cue in auditory stream segregation.
Based on results showing that the ''deviant-minus-standard'' estimate of the mismatch negativity ... more Based on results showing that the ''deviant-minus-standard'' estimate of the mismatch negativity (MMN) amplitude increases with increasing amounts of deviance, it has been suggested that the MMN amplitude reflects the amount of difference between the neural representations of the standard and the deviant sound. However, the deviant-minusstandard waveform also includes an N1 difference. We tested the effects of the magnitude of deviance on MMN while minimizing this N1 confound. We found no significant magnitude of deviance effect on the genuine MMN amplitude. Thus we suggest that the average MMN amplitude does not reflect the difference between neural stimulus representations; rather it may index the percentage of detected deviants, each of which elicits an MMN response of uniform amplitude. These results are compatible with an explanation suggesting that MMN is involved in maintaining a neural representation of the auditory environment.
Though many studies suggest that fine acoustic details fade from memory after 15 s or even less, ... more Though many studies suggest that fine acoustic details fade from memory after 15 s or even less, everyday experience tells us that the voice of a person or a musical instrument can be recognized long after it was last heard. We wished to determine whether tones leave a lasting memory trace using an experimental model of implicit recognition and testing whether exact pitch information can be retrieved even after 30 s. Event-related brain potentials demonstrated the survival of an accurate representation of tone pitch in the auditory cortex. This result provides a link between short-duration buffering and permanent storage of acoustic information.
Infrequently omitting a sound from a repetitive sequence elicits the mismatch negativity (MMN) ER... more Infrequently omitting a sound from a repetitive sequence elicits the mismatch negativity (MMN) ERP response when the stimulus onset asynchrony (SOA) is less than 200 ms. We contrasted two alternative explanations of omission MMN. (1) Each sound starts a separate temporal integration process. Omissions violate the constancy of the temporal structure within the integration window. (2) Sounds preceding an omission are perceived to be louder than those followed by a sound within the integration period, because omissions allow the full stimulus aftereffect to be included in perceived loudness. We varied the SOA between 117 and 217 ms. For this case, the temporal structure explanation predicts that no MMN will be elicited, whereas the loudness summation explanation predicts that MMN will be elicited. MMN was elicited by tone omissions with random SOA, suggesting that loudness summation plays an important role in the elicitation of omission MMN.
The brain organizes sound into coherent sequences, termed auditory streams. We asked whether task... more The brain organizes sound into coherent sequences, termed auditory streams. We asked whether task-irrelevant sounds would be detected as separate auditory streams in a natural listening environment that included three simultaneously active sound sources. Participants watched a movie with sound while street-noise and sequences of naturally varying footstep sounds were presented in the background. Occasional deviations in the footstep sequences elicited the mismatch negativity (MMN) event-related potential. The elicitation of MMN showed that the regular features of the footstep sequences had been registered and their violations detected, which could only occur if the footstep sequence had been detected as a separate auditory stream. Our results demonstrate that sounds are organized into auditory streams irrespective of their relevance to ongoing behavior. NeuroReport 14 :2053^2056
An audiovisual experiment using moving sound sources was designed to investigate whether the anal... more An audiovisual experiment using moving sound sources was designed to investigate whether the analysis of auditory scenes is modulated by synchronous presentation of visual information. Listeners were presented with an alternating sequence of two pure tones delivered by two separate sound sources. In different conditions, the two sound sources were either stationary or moving on random trajectories around the listener. Both the sounds and the movement trajectories were derived from recordings in which two humans were moving with loudspeakers attached to their heads. Visualized movement trajectories modeled by a computer animation were presented together with the sounds. In the main experiment, behavioral reports on sound organization were collected from young healthy volunteers. The proportion and stability of the different sound organizations were compared between the conditions in which the visualized trajectories matched the movement of the sound sources and when the two were independent of each other. The results corroborate earlier findings that separation of sound sources in space promotes segregation. However, no additional effect of auditory movement per se on the perceptual organization of sounds was obtained. Surprisingly, the presentation of movement-congruent visual cues did not strengthen the effects of spatial separation on segregating auditory streams. Our findings are consistent with the view that bistability in the auditory modality can occur independently from other modalities.
Predictive accounts of perception have received increasing attention in the past 20 years. Detect... more Predictive accounts of perception have received increasing attention in the past 20 years. Detecting violations of auditory regularities, as reflected by the Mismatch Negativity (MMN) auditory event-related potential, is amongst the phenomena seamlessly fitting this approach. Largely based on the MMN literature, we propose a psychological conceptual framework called the Auditory Event Representation System (AERS), which is based on the assumption that auditory regularity violation detection and the formation of auditory perceptual objects are based on the same predictive regularity representations. Based on this notion, a computational model of auditory stream segregation, called CHAINS, has been developed. In CHAINS, the auditory sensory event representation of each incoming sound is considered for being the continuation of likely combinations of the preceding sounds in the sequence, thus providing alternative interpretations of the auditory input. Detecting repeating patterns allows predicting upcoming sound events, thus providing a test and potential support for the corresponding interpretation. Alternative interpretations continuously compete for perceptual dominance. In this paper, we briefly describe AERS and deduce some general constraints from this conceptual model. We then go on to illustrate how these constraints are computationally specified in CHAINS. Keywords Auditory object Á Auditory scene analysis Á Deviance detection Á Predictive modelling Á Mismatch negativity (MMN) This is one of several papers published together in Brain Topography in the ''Special Issue: Mismatch Negativity''.
Content in the AJP database is intended for personal, noncommercial use only. You may not reprodu... more Content in the AJP database is intended for personal, noncommercial use only. You may not reproduce, publish, distribute, transmit, participate in the transfer or sale of, modify, create derivative works from, display, or in any way exploit the AJP content in whole or in part without the written permission of the copyright holder.
We report on the design and the collection of a multi-modal data corpus for cognitive acoustic sc... more We report on the design and the collection of a multi-modal data corpus for cognitive acoustic scene analysis. Sounds are generated by stationary and moving sources (people), that is by omni-directional speakers mounted on people's heads. One or two subjects walk along predetermined systematic and random paths, in synchrony and out of sync. Sound is captured in multiple microphone systems, including a four MEMS microphone directional array, two electret microphones sit uated in the ears of a stuffed gerbil head, and a Head Acoustics, head-shoulder unit with ICP microphones. Three micro-Doppler units operating at different frequencies were employed to capture gait and the articulatory signatures as well as location of the people in the scene. Three ground vibration sensors were recording the footsteps of the walking people. A 3D MESA camera as well as a web-cam provided 2D and 3D visual data for system calibration and ground truth. Data were collected in three environments ranging from a well controlled environment (anechoic chamber), an indoor environment (large classroom) and the natural environment of an outside courtyard. A software tool has been developed for the browsing and visualization of the data.
Uploads
Papers by István Winkler