Subglottal pressure and NAQ variation in voice production of classically trained baritone singers

Eva  Björkner; J. Sundberg

Subglottal pressure and NAQ variation in voice production of classically trained baritone singers

2005

ORIGINAL ARTICLE Subglottal pressure and normalized amplitude quotient variation in classically trained baritone singers EVA BJO ¨ RKNER 1,2 , JOHAN SUNDBERG 2 & PAAVO ALKU 1 1 Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Finland, and 2 Department of Speech Music Hearing, Kungliga Tekniska Ho ¨ gskolan, Stockholm, Sweden Abstract The subglottal pressure (P s ) and voice source characteristics of five professional baritone singers have been analyzed and the normalized amplitude quotient (NAQ), defined as the ratio between peak-to-peak pulse amplitude and the negative peak of the differentiated flow glottogram and normalized with respect to the period time, was used as an estimate of glottal adduction. The relationship between P s and NAQ has been investigated in female subjects in two earlier studies. One of these revealed NAQ differences between both singing styles and phonation modes, and the other, based on register differences in female musical theatre singers, showed that NAQ differed between registers for the same P s value. These studies thus suggest that NAQ and its variation with P s represent a useful parameter in the analysis of voice source characteristics. The present study aims at increasing our knowledge of the NAQ parameter further by finding out how it varies with pitch and P s in professional classically trained baritone singers, singing at high and low pitch (278 Hz and 139 Hz, respectively). Ten equally spaced P s values were selected from three takes of the syllable [pae:], initiated at maximum vocal loudness and repeated with a continuously decreasing vocal loudness. The vowel sounds following the selected P s peaks were inverse filtered. Data on peak-to-peak pulse amplitude, maximum flow declination rate and NAQ are presented. Key words: Baritone singers, ﬂow glottogram, glottal adduction, inverse ﬁltering, normalized amplitude quotient NAQ, singing voice, subglottal pressure, voice source Introduction Compared to spontaneous speech, singing is a much more highly controlled phonation task. Thus, in singing, phonation mode cannot be allowed to change automatically with subglottal pressure (P s ) or fundamental frequency (F0) since such changes may produce inappropriate expressive effects. Therefore, the use of professional singers ought to be advantageous in an investigation about the behavior of voice production over a wide range of P s values. Vocal sound is produced when the vibrating vocal folds chop the air stream from the trachea into a train of pressure pulses, called the voice source. This sound is filtered by the vocal tract resonator, the frequency response of which is controlled by the vocal tract shape. The transglottal pressure, in vowel production equalling P s , is the essential driving force, and also the primary variable for control of vocal intensity (1,2). Countless varieties of sounds and voice qualities can be obtained depending on the muscular, aerodynamic, and acoustical conditions in the glottis and in the vocal tract. Hence, the radiated sound is a complex product of the relationship between P s , the voice source and the formant frequencies. To gain information about the voice source, the acoustic filtering effect of the vocal tract resonances must be eliminated. Inverse filtering (3) is a widely used technique for this purpose. By cancelling the contributions of the vocal tract in the voiced signal, recorded either with a flow mask or a free-field microphone, an estimation of the pulsating trans- glottal airflow, e.g., flow glottogram waveform is obtained. A flow glottogram or volume velocity waveform reflects the glottal opening and closure Correspondence: Eva Bjo ¨rkner, Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing, P.O. Box 3000, FI-02015 HUT, Finland. Fax:  /358 9 460 224. E-mail: evab@speech.kth.se Logopedics Phoniatrics Vocology 2006, 1  9, PrEview article ISSN 1401-5439 print/ISSN 1651-2022 online # 2006 Taylor & Francis DOI: 10.1080/14015430600576055

in terms of time and amplitude. Parameters typically used in expressing voice source characteristics are, for example: the peak-to-peak pulse amplitude (U p-t- p ), and the negative peak of the differentiated flow glottogram, i.e. the maximum flow declination rate (MFDR). U p-t-p has been found to correlate strongly with the amplitude of the fundamental (4,5), and MFDR, determined by the glottal closing phase, has shown to be closely related to several voice char- acteristics, such as vocal intensity (6), sound pres- sure level (SPL) (4), and to P s (7). These two parameters alone seem to give informative glottal information and has become the focus for several studies. Parameterization of time-based features of the glottal flow from amplitude domain values was first proposed by Fant et al. (8). In parallel with Fant’s studies, Alku et al. (9,10) introduced the amplitude quotient (AQ), defined as the ratio between the peak-to-peak flow amplitude of the glottal waveform and the negative peak amplitude of the differentiated flow (U p-t-p /MFDR). Alku et al. found that the AQ parameter systematically reflected changes in phona- tion type. For their four male and four female subjects the AQ value decreased monotonically when phonation type was changed from breathy to pressed. They also found that AQ differed between sexes, possibly due to laryngeal and fundamental frequency (F0) differences. Hence, Alku et al. (11) introduced the normalized version of AQ called NAQ, which normalizes the AQ values with respect to the duration of the fundamental period (AQ/T0). The extraction of the amplitude-based AQ and NAQ does not involve the problematic time instant of the glottal opening, and NAQ was found to be more robust than the time-based closing quotient (ClQ). High AQ or NAQ values have been shown to indicate a less adducted phonation type, and hence lower values indicate a more adducted phonation type. Since its introduction NAQ has been used in a number of studies. Vilkman et al. (12) used NAQ in a study of vocal dynamic extremes, Gobl et al. (13) and Campbell et al. (14) in studies of voice quality, and Airas and Alku (15) found that NAQ varied depending on the emotional coloring of speech. In all of these studies, the voice material analyzed was speech. Sundberg et al. (16) were the first to apply NAQ to singing voices. Their results showed good correlations between NAQ and perceived degree of phonatory pressedness in a single singer subject, phonating in different singing styles and modes of phonation. Increasing P s affects not only the shape of the glottal pulse but also the rate of the glottal pulses, i.e. F0. Quoting Titze (2): ‘Speakers tend to raise their voice in pitch when they raise their voice in loudness, and they do it differently in different portions of their vocal range.’ In other words, increasing P s in natural speech increases vocal loudness, and mostly also raises F0. In singing, on the other hand, these effects cannot be allowed. Thus, singing requires a wide range of perfectly controlled P s to accurately match the required wide ranges of F0 and vocal loudness. Moreover, it seems likely that classically trained singers strive to keep voice source characteristics unaffected by F0 and loudness variation such that they avoid pressed phonation at high pitch and/or high loudness. Summarizing, P s variation in phonation affects F0 and MFDR. MFDR, in turn, is part of the NAQ parameter, and NAQ varies with glottal adduction/ phonation type. It is therefore interesting to find out how NAQ is affected by P s variation, and if that relationship is affected by F0. Our method was to analyze the voice source by inverse filtering the pressure signal. As subjects for this study we selected professional classically trained singers. It seemed reasonable to assume that, unlike untrained voices, such singers do not automatically change phonation mode with pitch and vocal loudness. An additional advantage of choosing baritone voices as subjects was their low F0 range, adding to the reliability of inverse filtering. Material and methods Subjects and recording Five Swedish professional baritone singers with international opera careers, age range 29  65 years, volunteered as subjects. They were asked to sing a diminuendo at a constant pitch while repeating the syllable [pae:], starting from high lung volume and at maximum degree of vocal loudness (see Figure 1). The sequence was repeated three times at each of three F0 values located at approximately 25%, 50% and 75% of their professional pitch range measured in semitones. Oral airflow was captured using a Rothenberg mask (17). Subglottal pressure (P s ) during the [p]-occlusions was cap- tured by means of a thin plastic tube attached to a pressure transducer from Glottal Enterprises (http://www.glottal.com/). The singer held this tube in the corner of his mouth. The flow and the pressure signals were recorded on separate tracks on a multichannel TEAC PCM recorder together with an audio signal, picked up by a B&K condenser microphone, 30 cm in front of the subject’s mouth. Calibration signal of flow, pressure, and SPL were all recorded on the same tape; airflow obtained from 2 E. Bjo ¨rkner et al.

Logopedics Phoniatrics Vocology 2006, 19, PrEview article ORIGINAL ARTICLE Subglottal pressure and normalized amplitude quotient variation in classically trained baritone singers EVA BJÖRKNER1,2, JOHAN SUNDBERG2 & PAAVO ALKU1 1 Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Finland, and 2Department of Speech Music Hearing, Kungliga Tekniska Högskolan, Stockholm, Sweden Abstract The subglottal pressure (Ps) and voice source characteristics of five professional baritone singers have been analyzed and the normalized amplitude quotient (NAQ), defined as the ratio between peak-to-peak pulse amplitude and the negative peak of the differentiated flow glottogram and normalized with respect to the period time, was used as an estimate of glottal adduction. The relationship between Ps and NAQ has been investigated in female subjects in two earlier studies. One of these revealed NAQ differences between both singing styles and phonation modes, and the other, based on register differences in female musical theatre singers, showed that NAQ differed between registers for the same Ps value. These studies thus suggest that NAQ and its variation with Ps represent a useful parameter in the analysis of voice source characteristics. The present study aims at increasing our knowledge of the NAQ parameter further by finding out how it varies with pitch and Ps in professional classically trained baritone singers, singing at high and low pitch (278 Hz and 139 Hz, respectively). Ten equally spaced Ps values were selected from three takes of the syllable [pae:], initiated at maximum vocal loudness and repeated with a continuously decreasing vocal loudness. The vowel sounds following the selected Ps peaks were inverse filtered. Data on peak-to-peak pulse amplitude, maximum flow declination rate and NAQ are presented. Key words: Baritone singers, flow glottogram, glottal adduction, inverse filtering, normalized amplitude quotient NAQ, singing voice, subglottal pressure, voice source Introduction Compared to spontaneous speech, singing is a much more highly controlled phonation task. Thus, in singing, phonation mode cannot be allowed to change automatically with subglottal pressure (Ps) or fundamental frequency (F0) since such changes may produce inappropriate expressive effects. Therefore, the use of professional singers ought to be advantageous in an investigation about the behavior of voice production over a wide range of Ps values. Vocal sound is produced when the vibrating vocal folds chop the air stream from the trachea into a train of pressure pulses, called the voice source. This sound is filtered by the vocal tract resonator, the frequency response of which is controlled by the vocal tract shape. The transglottal pressure, in vowel production equalling Ps, is the essential driving force, and also the primary variable for control of vocal intensity (1,2). Countless varieties of sounds and voice qualities can be obtained depending on the muscular, aerodynamic, and acoustical conditions in the glottis and in the vocal tract. Hence, the radiated sound is a complex product of the relationship between Ps, the voice source and the formant frequencies. To gain information about the voice source, the acoustic filtering effect of the vocal tract resonances must be eliminated. Inverse filtering (3) is a widely used technique for this purpose. By cancelling the contributions of the vocal tract in the voiced signal, recorded either with a flow mask or a free-field microphone, an estimation of the pulsating transglottal airflow, e.g., flow glottogram waveform is obtained. A flow glottogram or volume velocity waveform reflects the glottal opening and closure Correspondence: Eva Björkner, Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing, P.O. Box 3000, FI-02015 HUT, Finland. Fax: /358 9 460 224. E-mail: evab@speech.kth.se ISSN 1401-5439 print/ISSN 1651-2022 online # 2006 Taylor & Francis DOI: 10.1080/14015430600576055 2 E. Björkner et al. in terms of time and amplitude. Parameters typically used in expressing voice source characteristics are, for example: the peak-to-peak pulse amplitude (Up-tp), and the negative peak of the differentiated flow glottogram, i.e. the maximum flow declination rate (MFDR). Up-t-p has been found to correlate strongly with the amplitude of the fundamental (4,5), and MFDR, determined by the glottal closing phase, has shown to be closely related to several voice characteristics, such as vocal intensity (6), sound pressure level (SPL) (4), and to Ps (7). These two parameters alone seem to give informative glottal information and has become the focus for several studies. Parameterization of time-based features of the glottal flow from amplitude domain values was first proposed by Fant et al. (8). In parallel with Fant’s studies, Alku et al. (9,10) introduced the amplitude quotient (AQ), defined as the ratio between the peak-to-peak flow amplitude of the glottal waveform and the negative peak amplitude of the differentiated flow (Up-t-p/MFDR). Alku et al. found that the AQ parameter systematically reflected changes in phonation type. For their four male and four female subjects the AQ value decreased monotonically when phonation type was changed from breathy to pressed. They also found that AQ differed between sexes, possibly due to laryngeal and fundamental frequency (F0) differences. Hence, Alku et al. (11) introduced the normalized version of AQ called NAQ, which normalizes the AQ values with respect to the duration of the fundamental period (AQ/T0). The extraction of the amplitude-based AQ and NAQ does not involve the problematic time instant of the glottal opening, and NAQ was found to be more robust than the time-based closing quotient (ClQ). High AQ or NAQ values have been shown to indicate a less adducted phonation type, and hence lower values indicate a more adducted phonation type. Since its introduction NAQ has been used in a number of studies. Vilkman et al. (12) used NAQ in a study of vocal dynamic extremes, Gobl et al. (13) and Campbell et al. (14) in studies of voice quality, and Airas and Alku (15) found that NAQ varied depending on the emotional coloring of speech. In all of these studies, the voice material analyzed was speech. Sundberg et al. (16) were the first to apply NAQ to singing voices. Their results showed good correlations between NAQ and perceived degree of phonatory pressedness in a single singer subject, phonating in different singing styles and modes of phonation. Increasing Ps affects not only the shape of the glottal pulse but also the rate of the glottal pulses, i.e. F0. Quoting Titze (2): ‘Speakers tend to raise their voice in pitch when they raise their voice in loudness, and they do it differently in different portions of their vocal range.’ In other words, increasing Ps in natural speech increases vocal loudness, and mostly also raises F0. In singing, on the other hand, these effects cannot be allowed. Thus, singing requires a wide range of perfectly controlled Ps to accurately match the required wide ranges of F0 and vocal loudness. Moreover, it seems likely that classically trained singers strive to keep voice source characteristics unaffected by F0 and loudness variation such that they avoid pressed phonation at high pitch and/or high loudness. Summarizing, Ps variation in phonation affects F0 and MFDR. MFDR, in turn, is part of the NAQ parameter, and NAQ varies with glottal adduction/ phonation type. It is therefore interesting to find out how NAQ is affected by Ps variation, and if that relationship is affected by F0. Our method was to analyze the voice source by inverse filtering the pressure signal. As subjects for this study we selected professional classically trained singers. It seemed reasonable to assume that, unlike untrained voices, such singers do not automatically change phonation mode with pitch and vocal loudness. An additional advantage of choosing baritone voices as subjects was their low F0 range, adding to the reliability of inverse filtering. Material and methods Subjects and recording Five Swedish professional baritone singers with international opera careers, age range 29 65 years, volunteered as subjects. They were asked to sing a diminuendo at a constant pitch while repeating the syllable [pae:], starting from high lung volume and at maximum degree of vocal loudness (see Figure 1). The sequence was repeated three times at each of three F0 values located at approximately 25%, 50% and 75% of their professional pitch range measured in semitones. Oral airflow was captured using a Rothenberg mask (17). Subglottal pressure (Ps) during the [p]-occlusions was captured by means of a thin plastic tube attached to a pressure transducer from Glottal Enterprises (http://www.glottal.com/). The singer held this tube in the corner of his mouth. The flow and the pressure signals were recorded on separate tracks on a multichannel TEAC PCM recorder together with an audio signal, picked up by a B&K condenser microphone, 30 cm in front of the subject’s mouth. Calibration signal of flow, pressure, and SPL were all recorded on the same tape; airflow obtained from Ps and NAQ variation in classical baritones 3 Figure 1. An example of the task. Audio (top) and pressure (bottom) signals for a sung diminuendo at a constant pitch while repeating the syllable [pae:], starting from maximum degree of vocal loudness. a pressure tank attached to the flow mask via a flow meter, pressure by means of a water manometer and sound level by recording vowel sounds with SPL values determined from a sound level meter that was held next to the recording microphone. All calibration values were announced on the tape. For the analysis, the recorded material was transferred into sound files using the Soundswell signal analysis workstation from Hitech Development AB (http:// www.hitech.se/development/). While the flow recordings were analyzed for a different investigation (7) the present study was based on analyses of the audio signal. Pressure measurements A detailed analysis of the influence of Ps on glottal parameters should ideally be analyzed as a function of several Ps values. Therefore, from the three takes, ten equally spaced Ps values were selected by calculating the extremes of the singer’s total Ps range and by dividing it by 9, giving 10 ideal Ps values. The Ps values closest to these ideal values were then identified and the subsequent vowel was selected for analysis. Since the subjects continuously decreased vocal loudness while repeating the syllable [pae:], Ps decreased somewhat during each vowel. This should marginally over-estimate the Ps values associated with the subsequent vowel. An example of one singer’s Ps range for one sequence is shown in Figure 1. Inverse filtering and flow glottogram measurements To receive information about voice source characteristics, vocal tract resonances must be eliminated from the recorded signal. Flow glottograms were obtained by inverse filtering the speech pressure signal, using the custom made program DeCap (Svante Granqvist, Kungliga Tekniska Högskolan, Sweden). Since the microphone pressure signal was used there was no information about the Direct Current component of the glottal flow. From the inverse filtered samples four adjacent periods in the middle of the sample were averaged and period time (T0), peak-to-peak pulse amplitude Up-t-p, and MFDR were measured. Then, the ratio between Up-t-p and MFDR, i.e. amplitude quotient (AQ) (8) as well as the normalized version NAQ (10) were calculated. The reproducibility of the flow glottogram measurements was examined. The same person who had carried out the first analysis (co-author EB) analyzed 20 randomly selected samples after approximately 5 months. The resulting values were plotted against the corresponding values from the first analysis and a correlation coefficient was calculated for each parameter: the correlation coefficient equalled 0.999, 0.898, 0.990, and 0.996 for MFDR, Up-t-p, NAQ, and AQ, respectively (Figure 2). The flow recordings had been previously inverse filtered, as mentioned. Therefore, also the effect on flow glottogram measurements of using audio or flow signals for inverse filtering was analyzed. Seven samples produced by one of the singers were selected, avoiding, however, the softest phonations. The same sequence, of four adjacent periods, was then selected from the middle of the vowel. These sequences were inverse filtered using both the flow and the audio signals. Averages of the flow glottogram measurements across the four periods were then calculated. The results showed that NAQ differed less than 1%, thus confirming the reliability of the NAQ measurement. 4 E. Björkner et al. Figure 2. Reproducibility of the flow glottogram measurements MFDR, peak-to-peak pulse amplitude, and NAQ. The graph compares data derived on separate occasions from the same flow glottograms. (MFDR/maximum flow declination rate; NAQ/normalized amplitude quotient). Results The five singers’ Ps data were highly structured. Figure 3a shows the ten mean Ps values across singers for each of the three pitches. The figure shows that fundamental frequency and Ps are strongly correlated: the higher the pitch, the higher the Ps (18,19). Figure 3b shows, along a logarithmic scale, the ten Ps values used by the singers for the low and the high 100 45 40 30 Ps value [cmH2O] Pressure [cmH2O] 35 25 20 15 10 10 5 1 0 1 2 3 4 5 6 7 Pressure number 8 9 10 0 2 4 6 Pressure number 8 10 Figure 3. a: The singers’ mean subglottal pressures (Ps) for each of the ten selected pressures. Black, grey and white columns represent the fundamental frequencies of 139 Hz, 196 Hz and 278 Hz, respectively. b: Each singer’s Ps -values, plotted along a logarithmic scale, for each of the ten selected pressures. Filled and open symbols refer to the fundamental frequencies of 139 Hz and 278 Hz, respectively. Ps and NAQ variation in classical baritones F0. The Ps range clearly differed between the subjects. Thus, even though the Ps ranges differed, a similar relationship between Ps and F0 was found in all of them. Table I lists the slope and intercept values of the linear trend line equations for each singer’s Ps relationship between the two F0 values, separated by one octave approximately. The correlation was high, and on average all singers approximately doubled their Ps for the higher F0 as compared to the lower F0, as indicated by the slope values. In Figure 4 are shown Ps and MFDR, both averaged across subjects. MFDR increases with increasing Ps for both F0, as expected. For a given Ps value, the low F0 shows higher MFDR values and also increases more rapidly with Ps as compared to the higher F0. Figure 5a shows the MFDR values for the two singers who had the most extreme Ps ranges, and in Figure 5b the corresponding SPL values are plotted. The greater MFDR values for singer 4 (Figure 5a) correspond to his high Ps values. Figure 5b shows that this difference was associated with rather small SPL differences, particularly for the low F0. For the high pitch clearly higher values were produced by the singer who used higher Ps. The relationship between NAQ and Ps is illustrated in Figure 6a, where Ps is expressed in terms of the normalized excess pressure Psen (18). This value compensates for the fact that high F0 values require higher Ps values than low F0 values. Thus Psen facilitates comparisons along the F0 continuum. NAQ tends to decrease with increasing Ps. Although the NAQ values differ between the two F0 values, the relationship is similar: NAQ decreases quickly at low Ps and reaches an asymptote-like value at high Ps. Thus, the relationship can be approximated by a power function. In Table II the intersubject variation can be seen, showing correlation coefficient, constant and exponent for each singer and the two F0s. The mentioned difference in magnitude between the two F0 values suggests an F0 influence on NAQ. It therefore seemed interesting also to analyze how the nonTable I. Correlation squared (R2), slope and intercept of the best linear fit of the ten selected subglottal pressure (Ps) values used by the indicated singers at the high F0 plotted as a function of the same singer’s Ps values used the low F0. Singer 1 2 3 4 5 R2 Slope Intercept 0.995 0.985 0.982 0.991 0.966 2.234 2.307 2.112 2.567 1.982 /2.03 0.74 /5.99 /3.18 /6.22 5 Figure 4. Mean MFDR as a function of mean subglottal pressure (Ps) for the five singers. Curves and equations show the best power function fit of the data sets, and R2 represents the squared correlation. (MFDR/maximum flow declination rate; Ps /subglottal pressure). Filled and open symbols refer to the fundamental frequencies of 139 Hz and 278 Hz, respectively. normalized AQ varied with Ps. This variation is illustrated in Figure 6b. It was much smaller than for NAQ, even though for very low Psen values AQ was somewhat greater for the low F0 than for the high F0. As for NAQ the AQ values tended to reach an asymptote at high Ps. The individual singer’s relationships between Psen and AQ are plotted in Figure 7a e. No clear differences can be observed between the two F0 values. For the very lowest Psen values, Psen B/1, AQ decreases quickly with increasing Ps, but for higher Psen values AQ tends to remain constant. Discussion As mentioned above, Alku et al. (8) used the AQ measure for quantifying phonation types. Since AQ is defined as the ratio between the flow pulse amplitude and MFDR, it decreases with F0, even if the waveform remains identical. For this reason Alku and collaborators introduced the NAQ, obtained by normalizing AQ with respect to period time. In this way they eliminated the AQ variation that automatically occurs between genders because of the F0 difference. Nevertheless, the non-normalized AQ parameter showed much less variation with F0 than the normalized NAQ parameter. If it is correct that the NAQ measure faithfully reflects phonation mode, this would indicate that the singers used a more pressed type of phonation at low F0 than at high F0. This, however, seems highly implausible. In untrained voices and in amateur singers high tones are likely to sound more pressed than lower tones. Professional baritone singers, on the other hand, 6 E. Björkner et al. SPL [dB] MFDR [l/s2] 10000 100 High Low Singer 4 Singer 4 90 1000 80 70 60 100 100 50 1000 10000 50 60 70 80 90 100 Singer 1 Singer 1 Figure 5. Comparison of singer 1 and singer 4 with regard to MFDR, plotted on log scales, and SPL (left and right graphs) observed at the ten selected subglottal pressure (Ps) values. Open and filled symbols refer to low and high F0 (139 Hz and 278 Hz), respectively. (MFDR/ maximum flow declination rate; SPL /sound pressure level; F0 /fundamental frequency). would have learnt to avoid changing phonation mode with F0. This calls into question the close connection between phonation mode and NAQ. Gobl and Chasaide (13), analyzing spontaneous speech of a single female Japanese speaker, noted that AQ seemed to reflect, more effectively than NAQ, the vocal tenseness differences that the authors perceived in this voice across a large F0 range. Also, Björkner et al. (20) found, comparing female musical theatre singers’ chest and head registers, a somewhat smaller scatter for the AQ values than for the NAQ values, although the F0 variation was rather small. The relationship between phonation mode on the one hand, and AQ and NAQ on the other, needs to be analyzed in future investigations, preferably comparing voices, singing techniques, and voice qualities. One possibility would be that AQ more accurately than NAQ reflects phonation mode within a voice phonating at different F0, and that NAQ is more appropriate in Table II. Correlation squared (R2), constant and exponent for the power function trend lines of the relationship between normalized amplitude quotient (NAQ) and subglottal pressure (Ps) for high and low fundamental frequency (F0) sung by the five baritones. R2 Constant Exponent High F0 1 2 3 4 5 Mean 0.918 0.758 0.775 0.738 0.679 0.774 0.358 0.190 0.154 0.231 0.188 0.22 /0.508 /0.338 /0.202 /0.307 /0.152 /0.30 Low F0 1 2 3 4 5 Mean 0.939 0.369 0.821 0.851 0.813 0.759 0.315 0.085 0.221 0.184 0.138 0.19 /0.656 /0.079 /0.521 /0.492 /0.300 /0.41 Singer Figure 6. NAQ and AQ values (upper and lower graph) for each of the five singers’ ten selected samples as function of normalized excess pressure Psen. Filled and open symbols refer to low and high F0 (139 Hz and 278 Hz), respectively. (NAQ/normalized amplitude quotient; AQ/amplitude quotient; Psen /normalized excess pressure; F0 /fundamental frequency). Ps and NAQ variation in classical baritones 7 Figure 7. AQ values as a function of Psen for each singer. Filled and open symbols refer to low and high F0 (139 Hz and 278 Hz), respectively. (AQ/amplitude quotient; Psen /normalized excess pressure; F0 /fundamental frequency). comparisons of phonation mode between genders and/or between subjects. It is somewhat surprising that NAQ as opposed to AQ differed so clearly between F0 values. The difference between these measures is merely that AQ is based on absolute time values while NAQ is based on time values expressed as fractions of the period time. For a sine wave, the maximum value of NAQ is NAQSin 2A0=(2pF0A0F01 )1=p :0:318 (1) where A0 denotes the amplitude of the sine wave. This value is close to what was observed for the lowest Ps values. NAQ for a sawtooth wave will be infinitely high but the sawtooth is clearly an unrealistic approximation of a flow glottogram, since the closing of the glottis and the termination of the flow pulse cannot be performed in infinitely short time. For higher Ps values NAQ approached an asymptote value of approximately 0.07 and 0.14 for the lower and higher F0, respectively. This asymptote, which was very similar for our five singers, might have a physiological background related to the maximum speed of tissue motion. Schade (21) observed an upper limit for the horizontal vocal fold motion speed, being 1.6 m/s for the closing speed and 1.8 m/s for the opening speed. The relationship between this maximum tissue speed and the maximum flow declination rate is complex so the lowest possible NAQ value cannot be predicted. The better performance of AQ as compared to NAQ is likely to be related to the maximum speed of vocal fold motion during the closing phase. The wide Ps range that produced almost constant NAQ and AQ values are likely to be relevant from the point of view of voice quality. If we assume that NAQ reflects phonation mode the results indicate that the baritones kept phonation mode constant when they varied vocal loudness but that phonation mode differed depending on F0. If, within gender or within a voice, AQ rather than NAQ reflects phonation mode, the results suggest that the singers kept phonation mode independent of both loudness and pitch. While this seems quite plausible, the relationship between AQ and perceived degree of pressedness remains a question for future investigation. The singers’ AQ values showed a quite small dependence on Psen, particularly at the high F0. At the low F0, AQ increased quickly for the very lowest Psen values (see Figure 7 a d). It seems likely that opera singers rarely use these extremely low pressures, since a large operatic stage raises certain demands on the lowest useful sound levels. If this is correct, the data suggest that the singers used the same phonation mode throughout their dynamic range. The singers were asked to sing the [pae]-sequence with continuously decreasing vocal loudness. As a consequence all loud phonations were produced at high lung volumes. Iwarsson (22) noted that in untrained voices glottal adduction tended to increase with decreasing lung volume. Thomasson (23), however, observed that this effect did not occur in trained singers. Therefore, the systematic change of 8 E. Björkner et al. lung volume with Psen in our experiment should not have affected phonation mode and AQ. Vilkman et al. (24), studying vocal fold collision mass and register, conclude that the softest possible phonation can be produced only in the falsetto mode. These findings could explain our results for the low F0. The abrupt increase of the AQ values in very softest phonation might be a result of such a register shift. A change of register from chest to falsetto may very well be associated with a decrease of glottal adduction. Another possibility is that the abrupt change of AQ at low pressures is associated with a sudden disappearance of vocal fold closure. The singers differed considerably with respect to Ps. For example, singer 4 used almost twice as high pressures than singer 1, who sang with the lowest pressures. This difference was observed both in loud and in soft singing. Thus, singer 1 could be considered a ‘low-pressure singer’, and singer 4 a ‘high-pressure singer’. Similar observations have been made in a comparison between counter tenors, tenors, and baritones (25). Thomasson et al. (22) found that professional operatic singers’ breathing behavior was highly consistent within singers, but quite different between singers. It is possible that the differences in singers’ vocal pressure ranges observed here are associated with different breathing strategies. Sundberg et al. (26) found in their study of male singing that Ps systematically increased with increasing F0, and that a doubling of F0 was typically associated with a doubling of Ps. They also found that the doubling of Ps raised the SPL by approximately 10 dB. In studies on speech by Fant (1) and Holmberg et al. (27) a Ps doubling was found to increase intensity with approximately 9 13 dB. For the high F0 our singers 1 and 4 both increased SPL with approximately 10 dB while for the low F0 the SPL rise differed between singers and was somewhat lower. In acoustical analysis of voice function systematic variation of or at least information about Ps and F0 is important. Such information may be difficult to obtain from a speech database. The systematic variation of Ps, AQ and NAQ observed in our baritone singers is likely to be a result of their vocal skills; during years of training singers learn how to master and control forces relevant to voice production. Thus, in investigations of voice function the use of singer subjects may be quite advantageous. values. However, NAQ differences were found between the two F0 values (139 Hz and 278 Hz, respectively). The AQ values, by contrast, remained basically unaffected. If it is assumed that classically trained singers keep phonation mode independent of loudness and pitch, AQ is likely to reflect mode of phonation. Nevertheless, more studies are needed to fully understand the information about voice production that is offered by AQ and NAQ values. Acknowledgements The authors are indebted to the singers for their kind participation and to Svante Granqvist for his mathematical assistance. This investigation is part of Eva Björkner’s doctoral dissertation work, which is financially supported by the European Community’s Human Potential Programme under contract HPRN-CT-2002-00276 [HOARSE-network]. References 1. Fant G. Preliminaries to analysis of the human voice source. STL-QPSR. KTH. 1982;23(4):1 27. Available online: http://www.speech.kth.se/qpsr/qpsr 1960 1996.html 2. Titze I. On the relation between subglottal pressure and fundamental frequency in phonation. J Acoust Soc Am. 1989; 85(2):901 6. 3. Miller RL. Nature of the Vocal Cord Wave. J Acoust Soc Am. 1959;31:667 79. 4. Gauffin J, Sundberg J. Spectral correlates of glottal voice source waveform characteristics. J Speech Hear. 1989;32: 556 65. 5. Fant G. The voice source in connected speech. Speech Comm. 1997;22:125 39. 6. Fant G, Liljencrants J, Lin Q. A four-parameter model of glottal flow. STL-QPSR. KTH 1985/4. 1 13. Available online: http://www.speech.kth.se/qpsr/qpsr 1960 1996.html 7. Sundberg J, Andersson M, Hultqvist C. Effects of subglottal pressure variation on professional baritone singers’ voice sources. J Acoust Soc Am. 1999;105(3):1965 71. 8. Fant G, Kruckenberg A, Liljencrants J, Båvegård M. Voice source parameters in continuous speech. Transformation of LF-parameters. Proc. ICSLP-94, Yokohama, Vol. 3, 1451 1454 . 9. Alku P, Vilkman E. Amplitude domain quotient for characterization of the glottal volume velocity waveform estimated by inverse filtering. Speech Comm. 1996;18:131 8. 10. Alku P, Vilkman E. A comparison of the glottal voice source quantification parameters in breathy, normal and pressed phonation of female and male speakers. Folia Phoniatr Logop. 1996;48:240 54. 11. Alku P, Bäckström T, Vilkman E. Normalized amplitude quotient for parameterization of the glottal flow. J Acoust Soc Am. 2002;112(2):701 10. 12. Vilkman E, Alku P, Vintturi J. Dynamic extremes of voice in the light of time domain parameters extracted from the amplitude features of glottal flow and its derivative. Folia Phoniatr Logop. 2002;54:144 57. 13. Gobl C, Nı́ Chasaide A. Amplitude-based source parameters for measuring voice quality. Proc. ISCA tutorial and research workshop VOQUAL’03 on voice quality. Genevo 2003, 151 / / / / / / / Our baritone singers showed a linear increase of Ps with F0. AQ and NAQ were found to be principally unaffected by increases of Ps, except at very low Ps / / / / / Conclusion / / / / / / Ps and NAQ variation in classical baritones 14. 15. 16. 156. http://www.isca-speech.org/archive/voqual03/voq3 151. html Campbell N, Mokhtari P. Voice Quality: the 4th prosodic dimension. In: Proceedings of the 15th ICPhS Barcelona 2003: 2417 2920. http://feast.atr.jp/nick/pubs/vqpd.pdf Airas M, Alku P. Emotions in short vowel segments: effects of the glottal flow as reflected by the normalized amplitude quotient. Phonetica 2006;63:26 46. Sundberg J, Thalén M, Alku P, Vilkman E. Estimating perceived phonatory pressedness in singing from flow glottograms. J Voice. 2004;18:56 62. Rothenberg M. A new inversefiltering technique for deriving the glottal air flow waveform during voicing. J Acoust Soc Am. 1973;53:1632 45. Titze IR. Phonation threshold pressure: A missing link in glottal aerodynamics. J Acoust Soc Am. 1992;91(5):2926 35. Cleveland T, Sundberg J. Acoustic analyses of three male voices of different quality. In: Askenfelt A, Felicetti S, Jansson E, Sundberg J, editors. SMAC 83. Proceedings of the Stockholm Internat Music Acoustics Conf. Stockholm: July 28 Aug 1 1983. Roy Sw Acad Music. 1985;46(1);143 56. Björkner E, Sundberg J, Cleveland T, Stone E. Voice Source Differences between registers in female musical theatre singers. J Voice. Available online July 25 2005 In Press. / 17. / 18. / / / 19. 20. / 9 21. Schade G. Systematische Messung der Geschwindigkeiten der horizontalen Stimmlippenkonturen bei Veränderung des Schalldruckpegels und der stimmlichen Grundfrequenz in der Schließungsphase des phonatorischen Schwingungszyklus. Habilitation. University Hopsital Hamburg-Eppendorf; 2005. 22. Iwarsson J, Thomasson M, Sundberg J. Effects of lung volume on the glottal voice source. J Voice. 1998;12(4): 424 33. 23. Thomasson M, Sundberg J. Consistency of phonatory breathing pattern in professional operatic singers. J Voice. 1999; 13(4):529 41. 24. Vilkman E, Alku P, Laukkanen A-M. Vocal fold collision mass as a differentiator between registers. J Voice. 1995;9:66 73. 25. Sundberg J, Högset C. Voice source differences between falsetto and modal registers in counter tenor, tenor, and baritone singers. Logoped Phoniatr Vocol. 2001;26:26 36. 26. Sundberg J, Titze I, Scherer R. Phonatory control in male singing: A study of the effects of subglottal pressure, fundamental frequency, an mode of phonation on the voice source. J Voice. 1993;7(1):15 29. 27. Holmberg E, Hillman R, Perkell J. Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal and loud voice. J Acoust Soc Am. 1988;84:511 29. / / / / / / / / / / / /

Log In

Subglottal pressure and NAQ variation in voice production of classically trained baritone singers