Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Speech Communication 41 (2003) 135–149 www.elsevier.com/locate/specom Temporal coding of the pitch of complex sounds by presumed multipolar cells in the ventral cochlear nucleus Ian M. Winter a a,* , Alan R. Palmer a,b , Lutz Wiegrebe a,c , Roy D. Patterson a Department of Physiology, Centre for the Neural Basis of Hearing, Downing Street, Cambridge CB2 3EG, UK b MRC Institute of Hearing Research, University Park, Nottingham University, Nottingham NG7 2RD, UK c Zoologisches Institut der Universit€at, M€unchen, Luisenstrasse 14, 80333 M€unchen, Germany Abstract Extensive studies of the encoding of fundamental frequency (f0 ) in the auditory nerve indicate that f0 can be represented by either the timing of the neuronal discharges or the mean discharge rate as a function of characteristic frequency. It is therefore of considerable interest to examine what happens to this information at the next level of the auditory pathway, the cochlear nucleus. Both physiologically and anatomically the cochlear nucleus is considerably more heterogenous than the auditory nerve. There are two main cell types in the ventral division of the cochlear nucleus; bushy and multipolar. Bushy cells give rise to primary-like responses whereas multipolar cells may be characterised by either onset or chopper type responses. Physiological studies have suggested that onset and chopper units may be good at representing the f0 of complex sounds in their temporal discharge properties. However, in these studies the pitch-producing sounds were usually characterised by highly modulated envelopes and it was not possible to tell if the units were simply responding to the modulation or the temporal fine structure. In this paper we examine the ability of onset and chopper units to encode the f0 of complex sounds when the modulation cue has been greatly reduced. These stimuli were steady-state vowels in the presence of background noise, and iterated rippled noise (IRN). The response of onset units to the vowel f0 in the presence of background noise was varied but many still maintained a strong response. In contrast, the majority of chopper units showed a greater reduction in their response to vowel f0 in the presence of background noise. In keeping with the vowel study, the responses of both types of unit to the delay of the IRN was reduced in comparison with their response to more highly modulated stimuli. Increasing anatomical, pharmacological and physiological evidence would seem to argue against onset units playing a direct role in pitch perception. However, some units, identified as sustained choppers, may be able to represent the pitch of complex sounds in their temporal discharges. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Fundamental frequency; Autocorrelation; Iterated rippled noise; Onset units; Chopper units 1. Introduction The fundamental frequency (f0 ) of complex sounds is closely related to their perceived pitch. * Corresponding author. The f0 may be signaled to the brain (by the auditory nerve) in either a rate-place or temporal-place code and the responses of cells in the cochlear nucleus (an obligatory synapse for the auditory nerve) are of considerable interest in determining which of these codes is important. The cochlear nucleus is classically subdivided into three parts; 0167-6393/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. doi:10.1016/S0167-6393(02)00098-5 136 I.M. Winter et al. / Speech Communication 41 (2003) 135–149 anteroventral, posteroventral and dorsal. In this paper we concentrate on the responses of single units in the antero- and postero-ventral parts, together known as the ventral cochlear nucleus (VCN). The VCN is characterised by three physiological response types; primary-like (PL), chopper and onset (see Fig. 2 for examples). Each unit type is assumed to represent a separate, parallel processing unit, within the VCN. PL units are named for the similarity of their responses to those of their primary afferent input from the auditory nerve. They are recorded from bushy cells, mainly in the anteroventral part of the nucleus (Rhode et al., 1983). Both chopper and onset units are believed to be recorded from multipolar cells distributed throughout the VCN (e.g. Smith and Rhode, 1989). In keeping with their anatomical heterogeneity, these cell types give rise to different physiological responses. Saturation does not limit the temporal encoding of pitch at high sound levels in the auditory nerve. Young and Sachs (1979) showed that the temporal encoding of speech sounds remained stable well beyond the sound level where the fiberÕs rate response saturated. The same is true, to some extent, for the encoding of amplitude modulation in the auditory nerve (Frisina et al., 1996). For a temporal code based on inter-spike intervals, however, the monotonically rising rate-level functions of auditory-nerve fibers represent a further problem. An increase in discharge rate is accompanied by shorter inter-spike intervals, thus interfering with an analysis of these intervals in terms of the periodicity associated with the pitch sensation. One possible way to overcome this problem is the processing of higher-order inter-spike intervals; that is the interval not only between successive spikes but also non-successive spikes, an operation equivalent to autocorrelation of the spike train (Carianni and Delgutte, 1996a,b; Shofner, 1991, 1999). Stimulus periodicity encoded in first-order inter-spike intervals at low stimulus levels may be preserved in higher-order inter-spike intervals at higher sound levels. This was confirmed experimentally by Carianni and Delgutte (1996a,b) who found a temporal neural correlate of pitch in the cat auditory nerve; they showed that pitch is well represented in an all-order inter-spike inter- val analysis whereas a first-order analysis was susceptible to changes in sound level. This conclusion cannot readily be extended to the cochlear nucleus. In the cochlear nucleus, the neural information provided by the auditory nerve is subjected to different types of temporal and spectral processing––the latter through the interaction of units with different best frequencies (BFs). Onset units accentuate the degree of amplitude modulation of an acoustic stimulus (e.g. Kim et al., 1986; Kim and Leonard, 1988; Rhode and Greenberg, 1994). Chopper units also accentuate amplitude modulation, but they do so only for a limited range of modulation frequencies in the vicinity of their chopping frequency. Thus, chopper units have modulation transfer functions with a band-pass characteristic, particularly at high sound levels (Kim et al., 1990; Frisina et al., 1990). This distinguishes chopper units from PL units or auditory-nerve fibers which typically show modulation transfer functions with a low-pass characteristic and a relatively small degree of temporal synchronization to a sinusoidal modulator. A number of studies suggest that onset units in the cochlear nucleus are able to represent the pitch of a wide range of stimuli in their temporal discharge pattern. The onset–chopper (OC, Rhode and Smith, 1986; Winter and Palmer, 1995) subgroup of onset units is characterised by a dynamic range much greater than that of individual auditory-nerve fibres; the range can be as much as 80–90 dB. However, OC units typically have a bandwidth of three octaves and it may even be as wide as six octaves (Jiang et al., 1996; Palmer et al., 1996). Thus, although these neurons may be able to represent pitch over a wide range of levels they appear not to have sufficient frequency selectivity to represent the harmonic spectrum. In this paper we use parametric designs to measure the response of these putative pitch encoding units in a systematic way. In the first set of experiments, we examine the ability of chopper and onset units to encode the f0 of steady-state vowels in background noise. The noise levels used do not affect the perception of the vowel pitch or its identification. We show that while all chopper units show a reduction in their response to the f0 , I.M. Winter et al. / Speech Communication 41 (2003) 135–149 many onset units maintain a good representation of f0 even in the presence of background noise. In the second set of experiments we examine the ability of onset and chopper units to represent the delay of iterated rippled noise (IRN). For most pitch-producing sounds, such as steady-state vowels, there is a strong correlation between the presence of harmonically spaced peaks in the internal tonotopic representation of the sound, and the presence of peaks in the interval histogram or autocorrelogram at the period of the sound. For such sounds it is difficult to say whether a spectral or temporal auditory mechanism is more likely to be the basis of the pitch. IRN enables the decoupling of pitch from the peaks in the internal tonotopic representation; when highpass filtered, to remove resolved harmonics, the IRN produces a tonotopic rate representation very similar to that of the original noise even though it produces a pitch (Patterson et al., 1996). It has been argued that the pitch arises from the fine-grain temporal regularity which is well represented in the autocorrelogram. Consequently, this stimulus has been used in both perceptual and physiological experiments to study the processing of temporal fine structure in the auditory system (e.g. Shofner, 1991, 1999; Wiegrebe and Patterson, 1999; Yost, 1996a,b; Yost et al., 1996; Krumbholz et al., 2001). Rippled noise (RN) is produced from white noise (WN) by delaying a copy of the noise by d ms and adding the delayed noise back to the original. IRN is produced by repeating the delay and add process n times, and it is referred to as IRN(d; n). The delay-and-add process introduces temporal regularity into the fine structure of the noise (Fig. 1A) which is revealed by peaks in the autocorrelation function of the waveform (Fig. 1C). It also introduces a ÔrippleÕ into the long-term power spectrum of the waveform (Fig. 1B). Note, however, that the resolution of the spectral analysis performed in the cochlea is inversely related to frequency and so high frequency peaks merge in the internal tonotopic representation. Simulations of the processing of IRN (Griffiths et al., 1998) do not show resolved peaks above about the sixth harmonic and it has been argued (Yost et al., 1996) that the pitch of IRN is best represented by the 137 Fig. 1. An illustration of IRN. (A) The IRN waveform when the delay is 8 ms, the gain is unity and the number of iterations is 16. The resulting magnitude spectrum (B) shows peaks at integer multiples of 125 Hz which is the reciprocal of the 8 ms delay used to generate the IRN. Note that the complete stimulus is not shown. The normalized autocorrelation function (C) shows peaks at integer multiples of the 8 ms correlation lag. The position and height of the first peak (apart from the peak at the zero lag) in the autocorrelation function, h1 , has been used to model the pitch and pitch strength of IRN. Note that the height of h1 is slightly less than unity reflecting the quasi-periodicity of IRN stimuli. position of the first peak in the autocorrelation of the waveform (h1 , Fig. 1C). Analyses of the first-order inter-spike interval histograms (ISIHs) revealed that units often showed a maximal response to a specific IRN delay (best period), but also responded well for a limited range of delays (or pitches) around this best period. We refer to this preference in first-order interval 138 I.M. Winter et al. / Speech Communication 41 (2003) 135–149 statistics as Ôperiodicity tuningÕ. The range of best periods varied from 3.75 to 13 ms (between 77 and 267 Hz) for OC units and from 2.25 to 10.8 ms (between 93 and 444 Hz) for sustained chopper (CS) units. The range of best periods for transient chopper (CT) units was 1.4–8.8 ms (113–714 Hz). However, increasing anatomical, physiological and pharmacological evidence suggests that if multipolar cells are to directly contribute to the encoding of the pitch of complex sounds then those units classified as CSs are the most promising candidates. 2. Methods A detailed description of the stimulus presentation and response acquisition procedures for the vowel study can be found in (Winter and Palmer, 1995) and for the IRN study in (Wiegrebe and Winter, 2001) and so we shall only describe them in brief here. All experiments were carried out using anaesthetised, pigmented, post-weaned, guinea pigs. The procedures used in this paper were approved by the United Kingdom Home Office Act (1986) by the issue of a project and a personal license to the first and second authors. chronization index (SI) was then obtained by dividing the magnitude of the f0 component of the Fourier transform (100 Hz) of these histograms by the magnitude at 0 Hz. IRN was generated by delaying and adding WN. In the majority of cases we used the ‘‘add same’’ configuration as defined in (Yost, 1996a,b) with a gain of 1 and 16 iterations. For convenience, we use f0 to designate the reciprocal of the delay of an IRN, although the waveform is not periodic. The f0 of the IRN ranged from 31.25 to 1000 Hz in half-octave steps. Stimuli were generated digitally using the Tucker–Davis System II DSP Board at a sampling rate of 20 kHz. The stimulus duration was 409.6 ms including 10 ms cos2 ramps. Each stimulus was presented 25 times at a rate of one per second. Unless otherwise stated, the stimuli were refreshed for every presentation. Despite the stochastic nature of IRN, different stimulus waveforms share the same form of temporal regularity. The stimulus energy was kept constant irrespective of f0 . As a control we also collected responses to WN at the same sampling rate and energy level. 2.2. Analyses 2.1. Complex stimuli The vowel sound /a/ was produced using the Klatt software cascade synthesizer (Klatt, 1980). Details of formant frequencies and bandwidths are shown in Table 1. The stimuli were presented in pairs; each pair consisted of a vowel alone (120 ms duration), a silent interval of 240 ms and then the vowel in WN (3 or 10 dB s/n, 120 ms duration). These vowel pairs were presented at a rate of one per 740 ms at approximately 80 dB SPL. To quantify the responses to f0 , we first constructed period histograms from the spike times. The syn- Recordings from single units were made using tungsten-in-glass microelectrodes. Wideband noise was used to search for single units. Upon isolation of a single unit, estimates of BF and threshold were obtained using audio–visual criteria. The spontaneous discharge was measured over a 10-s period. Single units were classified by their discharge regularity (Young et al., 1985), and by the shape of their temporal discharge pattern (see Fig. 1) as revealed by their peri-stimulus time histogram (PSTH) in response to suprathreshold BF tone bursts. To identify a unit as an onset unit, we Table 1 Fundamental frequency (Hz) and frequency (Freq.) and bandwidth (BW) in Hz of formants (F1–F3) of the synthetic vowel /a/ Vowel /a/ f0 F1 Freq. Freq. BW Freq. F2 BW Freq. F3 BW 100 730 90 1090 110 2440 170 I.M. Winter et al. / Speech Communication 41 (2003) 135–149 used the classification scheme of Winter and Palmer (1995). PSTHs were generated using the responses to 250 pure tone bursts; the signal duration was 50 ms and the frequency was the unitÕs BF. The rise– fall time was 1 ms (cos2 gate) and the repetition rate was 4 s1 . The starting phase of each tone burst was varied randomly to reduce the influence of phase-locked discharges on the shape of the PSTH for units with low BFs. Spikes were timed with 1 ls resolution with a Tucker–Davis Technologies event timer (ET1). Typically, two PSTHs were collected at sound levels of 20 dB and either 40 or 50 dB suprathreshold. All single tone PSTHs shown in this paper include a 20 ms delay before the onset of the stimulus. We have analysed the responses of single units in terms of their first-order and all-order interval statistics. The all-order interval statistic is analogous to the autocorrelation of the spike train. In the IRN study, for the majority of cases, we also presented WN to the unit at the same sampling rate and sound level. A similar discharge rate was elicited for both WN and IRN stimuli except where the BF of the unit was in a spectral dip between low-frequency spectral peaks in the IRN spectrum. The strength of the pitch associated with IRN increases monotonically with the number of iterations (see Fig. 6 of this study and Yost, 1996a). In order to quantify the relationship between the first-order ISI response to WN and the firstorder ISI response to IRN (see Fig. 8), we fitted a gamma distribution (Eq. (1)) to the first-order ISI distribution of the WN (e.g. Fig. 7A) gðtÞ ¼ atn1 e2pbt ð1Þ The best-fitting gamma distribution was then subtracted from the first-order ISI distribution in response to the IRN. For each IRN delay the difference between the number of intervals at the IRN delay in the two distributions was termed the interval enhancement. The interval enhancement distribution was then itself fitted with a gamma function (see Fig. 7B). The interval enhancement measure emphasises the fact that our reference condition (WN) does not produce a pitch 139 sensation and that any changes in the ISI statistics are due to the quasi-periodicity present in the IRN stimuli. 3. Results Single units were classified by the shape of their PSTH and their discharge regularity (Bourk, 1976; Young et al., 1985; Blackburn and Sachs, 1989; Rhode and Smith, 1986; Winter and Palmer, 1990). For comparison purposes, in Fig. 2 we show the responses of five unit types commonly recorded from the VCN. Four of these are thought to be recorded from multipolar cells while the fifth, PL is thought to be recorded from bushy cells. The PL unit has a high probability of discharge just following the stimulus onset followed by an exponential decline to a steady-state rate. The ISIH also has an exponential decline and is multipeaked which is indicative of the unitÕs ability to phase lock well to low-frequency sounds (Bourk, 1976; Rhode and Smith, 1986; Blackburn and Sachs, 1989; Winter and Palmer, 1990). The regularity analysis reveals that the unit is characterised by irregular firing (coefficient of variation; CV > 0:5). In contrast the CS unit is characterised by a sequence of regularly spaced peaks in the PSTH that are unrelated to stimulus frequency. The ISIH has a narrow peak at the chopping period and the regularity analysis confirms the regularity of the sustained discharge (the mean and standard deviation remain constant over the analysis period). The constant regularity, with a CV less than 0.35, identifies this unit as a CS (Young et al., 1985). When the PSTH is characterised by regular peaks but the CV is greater than 0.35 the unit is classified as a CT as illustrated in the bottom row of Fig. 2The figure also shows the responses of two onset units to a 50 dB suprathreshold tone burst at the unitÕs BF (OC and OL). Onset units are characterised by a very high probability of discharge at stimulus onset, often followed by a pause and then a low level of discharge. The OC unit had a reasonably high rate of steady-state discharge which enabled the calculation of the unitÕs discharge regularity. The CV was 0.32 which placed it in between the regular dischargers and the irregular 140 I.M. Winter et al. / Speech Communication 41 (2003) 135–149 Fig. 2. Examples of the unit types found in the VCN (see text). All temporal discharge patterns were obtained in response to 50 ms suprathreshold BF tone bursts, 20 dB above threshold at the unitÕs BF. The solid line beneath the top-left panel indicates the temporal position of the tone burst. The second column shows the first-order ISIHs. The third column shows a measure of a unitÕs discharge regularity, with the solid lines representing the mean interval and dotted lines the standard deviation of the intervals. Note that a low coefficient of variation (CV < 0:5) indicates a unit with a regular discharge pattern. The PSTHs for the OC and OL units were obtained in response to a 50 dB suprathreshold tone burst. dischargers (cf. Young et al., 1985; Winter and Palmer, 1995). The narrow peaks in the temporal adaptation pattern and also in the first-order ISIH pattern identify this unit as an OC. The OL unit is similar in many of its response properties to the OC unit but is characterised by only one peak at I.M. Winter et al. / Speech Communication 41 (2003) 135–149 response onset (e.g. Rhode and Smith, 1986; Winter and Palmer, 1995). 3.1. Response of multipolar cells to vowels in quiet and in noise Both onset and chopper responses appear to be characteristic of multipolar cells in the anteroventral and posteroventral cochlear nucleus. However, consistent with their very different responses to pure tones, these response types respond quite differently to vowels and vowels in noise. Fig. 3 shows period histograms (over two periods of the f0 ) of the responses of an OC unit (left column) and a CS unit (right column) to the vowel /a/. The BFs are different which probably contributes to the different details of the responses. The response of the OC unit (Fig. 3A) is well locked to the f0 in quiet and its locking to f0 is 141 little diminished by the addition of background noise, be it 10 dB s/n (Fig. 3C) or 3 dB s/n (Fig. 3E). In contrast, while the response of the CS unit in quiet is well locked to the f0 (Fig. 3B), it is severely degraded even by the addition of the weaker noise. Further increasing the noise level did not further degrade the locking of the CS unit to the f0 . Note that the BF of the CS unit shown in this figure is 3.57 kHz which is well away from the nearest formant of /a/ and in the region where auditory-nerve fibres show good modulation to the f0 . The OC unit has a BF of 1.31 kHz, which is close to the first formant (F1) of /a/, where auditory-nerve fibre responses would usually be strongly locked to F1 and hence the response to f0 would be relatively weak. Fig. 4 shows the responses of populations of onset and chopper units to the vowels in quiet and in noise. No attempt has been made to subdivide Fig. 3. Responses from an onset (BF ¼ 1:31 kHz) and chopper unit (BF ¼ 3:57 kHz) to the steady-state vowel /a/ in quiet and two levels of background noise. Signal-to-noise ratios are indicated on the right-hand side of the figure. Two consecutive periods of the f0 are shown in the histogram. Even when the s/n ratio is 3 dB, the onset unit still provides a good representation of the f0 in terms of its synchronized discharges. 142 I.M. Winter et al. / Speech Communication 41 (2003) 135–149 Fig. 4. Degree of modulation of period histograms at f0 as a function of unit BF, as determined by the SI, for the steady-state vowel /a/. The bottom row shows the responses from a population of onset units in quiet; the s/n was 10 dB. The top row is in the same format for a population of chopper units. Note that the SI for chopper units at 10 dB s/n dropped below 0.5 whereas many onset units had SIs above this value at the same s/n ratio. the populations of onset and chopper units. As in the auditory nerve (Miller and Sachs, 1984; Palmer, 1990), the response to a vowel in quiet, in terms of locking to f0 , tends to occur in the frequency regions between formants. The addition of 10 dB s/n noise reduces the locking to the f0 of virtually all of the units in our chopper population (as it does in the auditory nerve: Miller and Sachs, 1984), but the locking still remains significant in most cases. For the onset population several features are noteworthy. First, the locking of the onset units is generally stronger (higher SI values reaching close to 1.0) than in the chopper population (or in the nerve fibre population; Miller and Sachs, 1984; Palmer, 1990). Second, the spread of locking in the onset population seems to be more extensive than in the auditory nerve (i.e. not so much restricted by the regions of good response between the formants). Third, the background noise at 10 dB s/n, while reducing the locking to f0 , still leaves the vast majority of onset units significantly locked to the f0 (Fig. 4C and D). Note that in background noise, the SI to f0 is always below 0.5 for chopper units, whereas for the onset population, the SI of many units remains above 0.5. It is clear that despite the relatively high levels of background noise both types of multipolar cell can still signal the f0 in the timing of their spikes. 3.2. Responses of multipolar cells to iterated rippled noise The responses of an OC unit to IRN as a function of the number of iterations are shown in Fig. 5. The responses are plotted as first- and all-order ISIHs, in the left and right columns, re- I.M. Winter et al. / Speech Communication 41 (2003) 135–149 143 Fig. 5. The effect of increasing number of iterations at a delay near the unitÕs best periodicity. The unit was an OC. The left-hand column shows the first-order ISIH in response to an IRN with a gain of 1, delay of 11.2 ms. The top row shows the response to equal energy WN. The right column shows the all-order ISIH in response to the same stimuli. spectively. The top row shows the response to equal energy WN. The delay of the IRN was 11.2 ms and, as the number of iterations was increased, a clear peak emerged in both the first- and allorder ISI at 11.2 ms. We describe this increase as Ôinterval enhancementÕ as there is an increase in the number of intervals at the delay in comparison with the response to the WN stimulus. Thus, the IRN delay (and hence the pitch) is well represented in the temporal discharges of this unit. It should be noted that this type of response was delay dependent, with the unit showing a preference for some delays over others; in this respect the unit could be said to be tuned to the delay of IRN. We refer to the delay at the peak of this tuning as a unitÕs best periodicity (see Fig. 7 for a further example of this delay tuning). It has been shown psychophysically (Yost, 1996b) that the pitch strength of IRN grows monotonically with the number of iterations and that the pitch strength grows as a function of 10h1 . In Fig. 6 we show the growth of the height of h1 as well as the growth of 10h1 . Also shown in Fig. 6 is the growth in the number of intervals at the delay of IRN (11.2 ms) as a function of the number of iterations for both the first- and all-order analyses. 144 I.M. Winter et al. / Speech Communication 41 (2003) 135–149 Fig. 6. A comparison of the responses of the OC unit shown in Fig. 5 with the growth of h1 (j) and 10h1 . It is claimed that the pitch strength of IRN follows this latter function (d). The number of intervals at the delay (11.2 ms) is plotted as a function of number of iterations for both the first-order ( ) and the all-order (N) ISIs. The ordinate represents the magnitude of h1 or 10h1 measured from the autocorrelation of the stimulus for the lower two plots while for the upper plots the ordinate represents the number of intervals per presentation.  Fig. 7. Estimation of the peak in the first-order ISIH for WN and the peak in the interval enhancement. (A) An example of a WN first-order ISIH and the fitted gamma function for an OC unit. The BF was 3.1 kHz. The interval enhancement is shown in (B) by the solid line and asterisks. The best-fitting gamma function for the interval enhancement is shown by the line. The peaks, as estimated from the maximum height in the gamma functions are shown on the figures. Consistent with the psychophysics, we also observe monotonic growth in the number of intervals at the IRN delay. 3.3. Comparison of the responses to broadband noise and IRN To examine the relationship between the response to WN and the response to IRN we have chosen to fit the first-order ISIH in response to WN with a gamma function and then compare this fit with a gamma function fitted to the interval enhancement plot (see Section 2). An example of the results of the fitting procedure is shown in Fig. 7 for a unit classified as OC. The positions of the peak fitted to the noise ISIH distribution and the peak fitted to the IRN interval enhancement plot are, in this example, similar. Fig. 8 shows the relationship between the peak of the gamma fit to the IRN and the peak of the gamma fit to the WN for a population of units believed to be multipolar cells. Units classified as onset exhibit peaks between 3 and 13 ms. Of these 23 onset units, 16 were Fig. 8. Relationship between the peak of first-order ISIH to WN and the peak of the interval enhancement plot for single units. The peak of the first-order ISIH in response to WN is plotted on the abscissa. The peak was estimated from a gamma fit to the ISIH. The peak of the gamma fit to the interval enhancement plot in response to IRN is plotted on the ordinate. The dotted line indicates equality for the two values. The correlation between the two measures is greatest for OC and CS units (see Table 2). classified as OC. Units classified as CS show preferences for delays between 3 and 10.8 ms. I.M. Winter et al. / Speech Communication 41 (2003) 135–149 Table 2 Mean and standard deviation of the peak in the first-order ISIH to WN and the peak in the interval enhancement plot White noise (ms) Interval enhancement (ms) Correlation CT CS Onset 2.42  1.08 3.53  2.02 4.45  1.83 4.88  2.1 6.55  2.36 7.05  2.61 0.44 0.72 0.81 * Significant difference between peak in interval enhancement and peak in WN (one-tailed studentÕs t-test) at p < 0:05. Units classified as CT show, on average, interval enhancement at shorter delays than CS units. It should be noted (Table 2) that there was a significant difference between the peak in response to WN and the peak in response to IRN for CT units. There was a tendency for the points to lie above the line of unity slope indicating that the maximum enhancement to the IRN stimulus is at lower frequencies (longer delays) than the peak in the first-order ISIH. There was no significant difference between the peak in the response to WN and the peak in response to IRN for the CS and OC unit types. The correlation between the WN and IRN peak responses for the three unit types is shown in Table 2. There is a weak correlation for CT units and there are strong correlations for the CS and OC unit types. 4. Discussion In this study we have shown that units in the VCN, presumed to be multipolar cells, respond in several ways that are consistent with a role in the representation of the pitch of complex sounds. Onset-responding multipolar cells appear to signal the f0 of vowel sounds better than choppers, and can do so even in moderate levels of background noise. Such levels of background noise do not affect the identification or the timbre of the vowel sounds. However, it should also be noted that the best responses to the f0 were obtained from units with BFs above the third formant frequency, a frequency where the harmonics would be unresolved. It is unlikely that this frequency region is crucial in conveying the pitch of steady-state vowels (Houtsma and Smurzynski, 1990). Both 145 types of multipolar cell can enhance the representation of the delay of iterated ripple noise in their ISIHs; the enhancement was dependent on the magnitude of the delay. Transient chopper units seemed to enhance the short delays more; sustained and onset choppers seemed to enhance the longer delays more. The implications of these findings for the neural encoding of the pitch of complex sounds are discussed below. 4.1. Onset units and pitch Onset units have been subdivided into at least three physiological response types. However, we have only been able to record from two of these in this study. The discussion that follows relates primarily to OC units but might also apply to OL units. The third type, onset-I (OI) have also been implicated in the coding of pitch (see below for a brief discussion). OC units encode the pitch of voiced speech sounds with remarkable fidelity and may respond to the ambiguous pitches of inharmonic complexes (Kim et al., 1986; Palmer and Winter, 1992, 1993; Rhode, 1994, 1995). They also respond to amplitude modulated noise in a manner that is similar to their response to 200% amplitude modulated tones (Rhode, 1994). As we have shown here, their responses to f0 survive moderate levels of background noise. In order to explain the remarkable precision of spike timing in these units several authors have speculated that a form of across-frequency coincidence detection must be employed by these units (Rhode and Smith, 1986; Kim et al., 1986; Palmer and Winter, 1996). The wide bandwidth of OC units is certainly beneficial for the encoding of envelope periodicity (e.g. steady-state vowels; Kim et al., 1986). The responses of two onset units to IRN were shown by Shofner (1999). These units responded with discharges locked to the f0 of sine or cosine phase harmonic (CPH) complexes but responded very poorly to random phase harmonic (RPH) complexes and IRN with identical f0 Õs and sound level. We have not observed such an extreme difference in response to these three stimuli when recording from OC units. Nevertheless, we have observed that the response to the f0 of IRN is much weaker than that observed to RPH or CPH 146 I.M. Winter et al. / Speech Communication 41 (2003) 135–149 complexes (e.g. Figs. 4 and 5 in Winter et al., 2001) The complete absence of a response to IRN or RPH stimuli, as observed by Shofner (1999), may be due to either stimulus presentation level or type of onset unit. Interestingly, Evans and Zhao (1998) have shown that units classified as OI respond poorly to RPH complexes. OC units have been recorded from large multipolar cells within the VCN (Smith and Rhode, 1989). These are probably the same type of cell that Doucet and Ryugo (1997) have observed to contact cells in the fusiform layer of the dorsal cochlear nucleus. The connection of these cells in the cochlear nucleus is likely to be inhibitory, as their terminals stain positively for glycine and are characterised by pleomorphic vesicles in their synaptic endings (Smith and Rhode, 1989). In further studies, Doucet et al. (1999) have shown that the same large multipolar cells project to the contralateral cochlear nucleus. Therefore, despite their ability to precisely encode the f0 of many complex sounds, the current anatomical information about OC units would seem to argue against them playing a role in the encoding of the pitch of complex sounds and suggests that other unit types (e.g. PL and choppers) may play a more pivotal role. 4.2. Chopper units and pitch Chopper units can enhance the representation of sinusoidal amplitude modulated (SAM tones in comparison to their auditory-nerve fibre input. At moderate to high sound levels, some chopper units show a band-pass modulation transfer function (MTF: Frisina et al., 1990; Kim et al., 1990; Rhode and Greenberg, 1994). The units most likely to show a band-pass MTF were identified as CSs. This observation was used by Hewitt and Meddis (1994) to model the transformation from temporal MTFs to rate based MTFs in single units in the IC. It has since been proposed (Wiegrebe and Winter, 2001) that the regular discharge patterns seen in the responses of CS units may serve as a stage of temporal processing that converts the allorder ISI representation of the pitch at the level of the auditory nerve into a first-order ISI code and hence may also be useful for encoding the pitch of complex sounds. This hypothesis requires an array of chopper units with a range of best periodicities in iso-frequency laminae of the cochlear nucleus. The range of intrinsic oscillation frequencies seen in the unanaesthetised cat is very similar to the range seen in the anaesthetised guinea pig. However, while the range of best periodicities may, in part, be attributable to differences in stimulus presentation level or output discharge rate of a unit, in neither animal is this range sufficient to encompass the range of pitch perception in humans (see Krumbholz et al., 2000). The lack of a range of best periodicities to encompass the entire range of pitch perception appears to argue against chopper units playing a role in pitch perception. However, if the bandwidths of the periodicity tuning were sufficiently broad then perhaps the f0 could be represented by far fewer periodicity filters, perhaps analogous to the situation in colour vision. A further problem for the above hypothesis is the response of chopper units to IRN. Shofner (1999) has presented evidence that multipolar cells, specifically chopper units, were not well suited to encode the pitch of IRN. This conclusion was based on the responses to IRN stimuli with negative gain. These stimuli elicit ambiguous pitches that were only signalled in the temporal discharge patterns of PL units (and hence bushy cells). The responses of non-PL units (mainly choppers) largely reflected the stimulus envelope and not the fine timing in the waveform structure. This result appears to argue against chopper units playing a role in the encoding of the pitch of complex sounds in their first-order ISIHs. However, the examples given in (Shofner, 1999) come from units with relatively high BFs and a CT response pattern (see Fig. 2); it is possible that low BF units, in particular those classified as CS, may show responses in their temporal discharges related to the shift in time intervals associated with the pitch of IRN with negative gain. Until we have more data on this issue the role of CS units in encoding the pitch of complex sounds remains unresolved. Interestingly, Shofner (1999) also found that the majority of units with features related to the delay of the RN were characterised by low BFs, where phase locking would be strong. While we also I.M. Winter et al. / Speech Communication 41 (2003) 135–149 observe the strongest temporal responses in units with low BFs (unpublished observations) we also observed that some units, with BFs well removed from the phase-locking region, and at relatively high stimulus presentation levels, could encode the delay of IRN in terms of their first-order interval statistics (Winter et al., 2001). This result is in agreement with the vowel study presented here (see Fig. 3), where both onset and chopper units showed a strong response to the f0 , despite their BFs being above the phase-locking cut-off in the guinea pig (3.5 kHz, Palmer and Russell, 1986). Stimulus presentation level may well explain the different frequency ranges over which stimulus related features were observed between the two studies. It should, however, be made clear that the significance of high BF units responding well to the f0 of complex sounds is, at best, uncertain. If the regular discharge pattern observed in chopper units is responsible for the interval enhancement seen in this paper, and the best modulation frequency observed in other studies, then one might find a close correlation between intrinsic chopping frequency and interval enhancement. In the (Frisina et al., 1990) study there was little correlation between intrinsic chopping frequency and best periodicity (their best modulation frequency). However, in the (Kim et al., 1990) study a much closer relationship was found between a unitÕs intrinsic oscillation and its best modulation frequency. Kim et al. (1990) attributed this to their method of estimating a unitÕs intrinsic oscillation and/or a difference in unit type. The best correlation in the Kim study was found in CS units. If the response to WN is also a good measure of a unitÕs intrinsic oscillation then the results in the present study (see Fig. 8 and Table 2) support the findings of Kim et al. (1990). In a small number of units, Wiegrebe and Winter (2001) have shown that low BF units identified as CS may respond in a relatively levelindependent manner. Whether this is a common feature in CS units is not known. The importance of level independence, at such a low level of auditory processing is, however, unclear. For instance the majority of units in the inferior colliculus (the ultimate termination site for outputs from the cochlear nucleus and superior olivary 147 complex) respond in a highly non-monotonic fashion, suggesting that any level-independent response in the cochlear nucleus is recoded, in some as yet unspecified manner. 5. Summary and conclusions Our results show that units classified as onset show a more robust response to f0 in the presence of background noise than with chopper units. Both onset and chopper units are able to encode the delay of IRN in their first-order ISIHs but no single unit type is able to represent the entire range of delays over which a pitch is perceived. Although OC units often show a strong response to the f0 of complex sounds, increasing circumstantial evidence would appear to make their role in the encoding of pitch uncertain. For instance, anatomical and pharmacological studies suggest that they are probably wideband inhibitory interneurons within the cochlear nucleus. They may also project to the contralateral cochlear nucleus. Based on their responses to sinusoidal amplitude modulation (Frisina et al., 1990; Kim et al., 1990) and to IRN (Shofner, 1999; Winter et al., 2001) it would appear that CT units are ill-suited to encode the f0 of complex sounds. In contrast the responses of CS units to SAM and IRN suggest that they may have a role to play. However, it will be important to demonstrate that CS units can also show a strong response to f0 at relatively low sound levels and can represent the pitch of IRN with negative gain. Until we have answers to such questions, the role of these multipolar cells in the encoding of pitch will remain unresolved. Acknowledgements Supported by the United Kingdom MRC (G9900369). We thank Brian Moore and Keith Kluender for helpful reviews of the manuscript. References Blackburn, C.C., Sachs, M.B., 1989. Classification of unit types in the anteroventral cochlear nucleus: PST histograms and regularity analysis. J. Neurophysiol. 62, 1303–1329. 148 I.M. Winter et al. / Speech Communication 41 (2003) 135–149 Bourk, T.R., 1976. Electrical responses of neural units in the anteroventral cochlear nucleus of the cat. Doctoral dissertation, MIT, Cambridge, MA. Carianni, P.A., Delgutte, B., 1996a. Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. J. Neurophysiol. 76, 1698–1716. Carianni, P.A., Delgutte, B., 1996b. Neural correlates of the pitch of complex tones. II. Pitch shift, pitch ambiguity, phase-invariance, pitch circularity, rate pitch, and the dominance region of pitch. J. Neurophysiol. 76, 1717–1734. Doucet, J.R., Ryugo, D.K., 1997. Projections from the ventral cochlear nucleus to the dorsal cochlear nucleus in rats. J. Comput. Neurol. 385, 245–264. Doucet, J.R., Ross, A.T., Gillespie, M.B., Ryugo, D.K., 1999. Glycine immunoreactivity of multipolar neurons in the ventral cochlear nucleus which project to the dorsal cochlear nucleus. J. Comput. Neurol. 408, 515–531. Evans, E.F., Zhao, W., 1998. Periodicity coding of the fundamental frequency of harmonic complexes: physiological and pharmacological study of onset units in the ventral cochlear nucleus. In: Palmer, A.R., Rees, A., Summerfield, A.Q., Meddis, R. (Eds.), Psychophysical and Physiological Advances in Hearing. Whurr Publishers, London, pp. 186– 194. Frisina, R.D., Smith, R.L., Chamberlain, S.C., 1990. Encoding of amplitude modulation in the gerbil cochlear nucleus: I. A hierarchy of enhancement. Hear. Res. 44, 99–122. Frisina, R.D., Karich, K.J., Tracy, T.C., Sullivan, D.M., Walton, J.P., Colombo, J., 1996. Preservation of amplitude modulation coding in the presence of background noise by Chinchilla auditory-nerve fibers. J. Acoust. Soc. Am. 99, 475–490. Griffiths, T.D., Buchel, C., Frackowiak, R.S.J., Patterson, R.D., 1998. Analysis of temporal structure in sound by the human brain. Nature Neurosci. 1, 422–427. Hewitt, M.J., Meddis, R., 1994. A computer model of amplitude-modulation sensitivity of single units in the inferior colliculus. J. Acoust. Soc. Am. 95, 2145–2159. Houtsma, A.J.M., Smurzynski, J., 1990. Pitch identification and discrimination for complex tones with many harmonics. J. Acoust. Soc. Am. 87, 304–310. Jiang, D., Palmer, A.R., Winter, I.M., 1996. The frequency extent of two tone facilitation in onset units in the ventral cochlear nucleus. J. Neurophysiol. 75, 380–396. Kim, D.O., Leonard, G., 1988. Pitch-period following response of cat cochlear nucleus neurones to speech sounds. In: Duifhuis, H., Horst, J.W., Wit, H.P. (Eds.), Basic Issues in Hearing. Academic, London, pp. 252–260. Kim, D.O., Rhode, W.S., Greenberg, S.R., 1986. Responses of cochlear nucleus neurones to speech signals: neural encoding of pitch, intensity and other parameters. In: Moore, B.C.J., Patterson, R.D. (Eds.), Auditory Frequency Selectivity: A NATO Advanced Research Workshop. Plenum Press, New York, pp. 281–288. Kim, D.O., Sirianni, J.G., Chang, S.O., 1990. Responses of DCN-PVCN neurons and auditory-nerve fibres in unanaesthetised decerebrate cats to AM and pure tones: analysis with autocorrelation/power spectrum. Hear. Res. 45, 95– 113. Klatt, D.H., 1980. Software for a cascade/parallel formant synthesizer. J. Acoust. Soc. Am. 67, 971–995. Krumbholz, K., Patterson, R.D., Pressnitzer, D., 2000. The lower limit of pitch as determined by rate discrimination. J. Acoust. Soc. Am. 108, 1170–1180. Krumbholz, K., Patterson, R.D., Nobbe, A., 2001. Asymmetry of masking between noise and iterated rippled noise: Evidence for time-interval processing in the auditory system. J. Acoust. Soc. Am. 110, 2096–2107. Miller, M.I., Sachs, M.B., 1984. Representation of voice pitch in discharge patterns of auditory-nerve fibres. Hear. Res. 14, 257–279. Palmer, A.R., 1990. The representation of the spectra and fundamental frequencies of steady-state single- and doublevowel sounds in the temporal discharge patterns of guinea pig cochlear nerve fibres. J. Acoust. Soc. Am. 88, 1412– 1426. Palmer, A.R., Russell, I.J., 1986. Phase-locking in the cochlear nerve of the guinea pig and its relation to the receptor potential of inner hair cells. Hear. Res. 24, 1–15. Palmer, A.R., Winter, I.M., 1992. Cochlear nerve and cochlear nucleus response to the fundamental frequency of voiced speech sounds and harmonic complex tones. In: Cazals, Y., Demany, L., Horner, K. (Eds.), Auditory Physiology and Perception. Pergamon, Oxford, pp. 231–239. Palmer, A.R., Winter, I.M., 1993. Coding of the fundamental frequency of voiced speech sounds and harmonic complex tones in the ventral cochlear nucleus. In: Merchan, M.A., Juiz, J., Godfrey, D.A., Mugnaini, E. (Eds.), Mammalian Cochlear Nuclei: Organization and Function. Plenum, New York, pp. 373–384. Palmer, A.R., Winter, I.M., 1996. The temporal window of two-tone facilitation in onset units of the ventral cochlear nucleus. Audiol. Neurootol. 1, 12–30. Palmer, A.R., Jiang, D., Marshall, D., 1996. Responses of ventral cochlear nucleus onset and chopper units as a function of signal bandwidth. J. Neurophysiol. 75, 780– 794. Patterson, R.D., Handel, S., Yost, W.A., Datta, A.J., 1996. The relative strength of the tone and noise components in iterated rippled noise. J. Acoust. Soc. Am. 100, 3286–3294. Rhode, W.S., 1994. Temporal encoding of 200% amplitude modulated signals in the ventral cochlear nucleus of the cat. Hear. Res. 77, 43–68. Rhode, W.S., 1995. Interspike intervals as a correlate of periodicity in cat cochlear nucleus. J. Acoust. Soc. Am. 97, 2414–2429. Rhode, W.S., Greenberg, S.R., 1994. Encoding of amplitude modulation in the cochlear nucleus of the cat. J. Neurophysiol. 71, 1797–1825. Rhode, W.S., Smith, P.H., 1986. Encoding timing and intensity in the ventral cochlear nucleus of the cat. J. Neurophysiol. 56, 261–286. Rhode, W.S., Oertel, D., Smith, P.H., 1983. Physiological response properties of cells labelled intracellularly with I.M. Winter et al. / Speech Communication 41 (2003) 135–149 horseradish peroxidase in cat ventral cochlear nucleus. J. Comp. Neurol. 213, 448–463. Shofner, W.P., 1991. Temporal representation of rippled noise in the anteroventral cochlear nucleus of the chinchilla. J. Acoust. Soc. Am. 90, 2450–2466. Shofner, W.P., 1999. Responses of cochlear nucleus units in the chinchilla to iterated rippled noises: quantitative analysis of neural autocorrelograms of primarylike and chopper units. J. Neurophysiol. 81, 2662–2674. Smith, P.H., Rhode, W.S., 1989. Structural and functional properties distinguish two types of multipolar cells in the ventral cochlear nucleus. J. Comput. Neurol. 282, 595–616. Wiegrebe, L., Patterson, R.D., 1999. The role of modulation in the pitch of high-pass filtered iterated rippled noise. Hear. Res. 132, 94–108. Wiegrebe, L., Winter, I.M., 2001. Temporal representation of iterated rippled noise as a function of delay and sound level in the ventral cochlear nucleus. J. Neurophysiol. 85, 1206– 1219. Winter, I.M., Palmer, A.R., 1990. Responses of single units in the anteroventral cochlear of the guinea pig. Hear. Res. 44, 161–178. 149 Winter, I.M., Palmer, A.R., 1995. Level dependence of cochlear nucleus onset unit responses and facilitation by second tones or broadband noise. J. Neurophysiol. 73, 141–159. Winter, I.M., Wiegrebe, L., Patterson, R.D., 2001. The temporal representation of the delay of iterated rippled noise in the ventral cochlear nucleus of the guinea pig. J. Physiol. 537 (2), 553–566. Yost, W.A., 1996a. The pitch of iterated rippled noise. J. Acoust. Soc. Am. 100 (1), 511–518. Yost, W.A., 1996b. The pitch strength of iterated rippled noise. J. Acoust. Soc. Am. 100 (5), 3329–3335. Yost, W.A., Patterson, R.D., Sheft, S., 1996. A time domain description for the pitch strength of iterated rippled noise. J. Acoust. Soc. Am. 99, 1066–1078. Young, E.D., Sachs, M.B., 1979. Representation of steady-state vowels in the temporal aspects of discharge patterns of populations of auditory nerve fibres. J. Acoust. Soc. Am. 66, 1381–1403. Young, E.D., Robert, J.-M., Shofner, W.P., 1985. Regularity and latency of units in ventral cochlear nucleus: implications for unit classification and generation of response properties. J. Neurophysiol. 60, 1–29.