Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
A SPECTROGRAPHIC ANALYSIS OF VOCAL TECHNIQUES IN EXTREME METAL FOR MUSICOLOGICAL ANALYSIS Eric Smialek (1, 2)1 Philippe Depalle (2, 3) David Brackett (1) CIRMMT, McGill University, Montréal, QC, Canada ABSTRACT1 Extreme metal genres such as death metal and black metal force music analysts to seek alternative methods to Western notation-based analysis, especially when one asks what means of expression their vocalists may draw from in order to seem convincing and powerful to fans. Using spectrograms generated by AudioSculpt, a powerful sound analysis, processing, and re-synthesis program, this paper demonstrates a mixed application of spectrograms and conventional music analysis to vocals in two separate contexts: an a cappella recording in a soundproof laboratory and a commercial recording with a full band. The results support an argument for the utility of spectrograms in revealing articulations and expressive nuances within extreme metal vocals that have thus far passed unnoticed in popular music scholarship. Western notation are obvious [10]. With this in mind, the study of extreme metal presents something of a disciplinary middleground in its foundations in popular music studies and its incompatibility with Western notation. Because extreme metal vocals place greater importance on the timbral variations of vowels than on pitch and harmony, methods of analysis that rely on conventional notation do not reveal much information about their expressive screams. Extreme metal vocals thus provide an opportunity to analyze the role and possibilities of spectrographic technology to reveal information about musical expression in a genre of music for which little to no analytical methods have been established. Using real-time spectrographic displays, this paper will demonstrate how spectrograms can be useful research tools for scholars who require or seek alternatives to notation-based analysis. 2. 1. Popular music scholars have long insisted that details of musical sound such as rhythmic and melodic inflections or timbral characteristics are central in importance to popular musicians and their audiences [1, 2, 3, 4]. Accordingly, some of the leading popular music scholars such as Richard Middleton [1] and Philip Tagg [2] have expressed criticism towards analyses of popular music that overlook these features, often due to a reliance on Western music notation. One alternative is to visually represent sound through spectrograms, and it has now been nearly thirty years since Robert Cogan made his strong case for their utility to model musical sound in a way that can account for melodic, rhythmic, or timbral nuances [5]. Since then, despite the importance popular music scholars have placed in these features, they have rarely used spectrograms to study them. In the few instances where they have appeared, such as [6], reviewers have argued that spectrograms are superfluous or have voiced a general distrust of analytical technology such as spectrum photography or digital signal processing of spectral imagery [7, 8, 9]. Such a distrust of musical spectrograms has not, of course, extended to fields such as electroacoustic composition and analysis where there exists an epistemological tradition that has long supported the use of music technology and where the limitations of This research was greatly facilitated through funds from the Social Sciences and Humanities Research Council (SSHRC) and a CIRMMT Student Award. 1 (1): Musicology Area, (2): Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT), (3): Music Technology Area. _88 THE EXTREME METAL VOICE INTRODUCTION 2.1. Basic Aspects of Vocal Production To produce the vocal sounds characteristic of death metal and black metal (as well as related sub-genres), extreme metal vocalists pass air through the ventricular folds (or “false vocal chords”) located a few millimetres above the vocal folds (see [12] for anatomical details). This allows extreme metal vocalists to achieve the large spectral spread of energy visible in spectrograms.2 Extreme metal screams can be performed by either inhaling or exhaling, resulting in two very distinct styles of screaming. The different directions of air flow can be thought of as akin to the linguistic distinction made between voiced and unvoiced methods of articulating consonants: when performing exhaled vocals, one’s larynx vibrates, indicating that the vocal cords are vibrating—rather forcefully—whereas this vibration does not occur with inhaled vocals. This basic difference has a profound effect on the overall sound quality produced, the ease with which different phonetic articulations can be made, the ability for a vocalist to sustain a long scream, and the degree of strain put on the voice. Figure 1 demonstrates some of the acoustical differences between inhaled and exhaled vocals. Here, a volunteer extreme metal vocalist was asked to freely 2 A similar process is used in Mongolian throat singing where singing voice formants are used to convey melodic information [13]. (2, 3) (1) Figure 1. Some basic distinctions between extreme metal vocal techniques as demonstrated by a volunteer extreme metal vocalist. demonstrate these two types of voice. His inhaled vocals are noticeably more agile with spectral energy concentrated around a single focal point in the 1500– 2000Hz range (here and throughout this paper, the horizontal grid is set to increments of 500Hz). In contrast to that highly mobile focal point, the exhaled vocals maintain a more balanced spectral distribution of energy in the lower register of the spectrogram, moving in comparatively small increments from one vowel to the next in an almost mechanical fashion. black metal, a sub-genre where a generally higher, raspier voice predominates, vocalists will tend to raise the frequency of their vowel formants past the levels that would ordinarily correspond to their lyrics when spoken. One way to think of each of these techniques, especially those that lengthen the vocal tract, is to recognize that the vocalists are physically mimicking the physiological properties of a large beast: the ventricular folds produce a spectral spread of energy that suggests an inhuman sound source and the lengthening of the vocal tract exaggerates the size of that source. 2.2. Vocal Tract Alterations that Imitate Inhuman Sound Sources Within these two basic categories of voice production, vocalists may alter the length and shape of their vocal tract to modify their voices in a wide variety of expressive ways. To lengthen their vocal tract and produce a lower sounding voice, vocalists may simply lower their chin or conversely angle their chin upwards to shorten the vocal tract length for a higher scream. It is also possible to change the vocal tract length by raising or lowering the larynx (or voice box), a process which happens automatically in conjunction with the rounded or spread lip shapes we use to articulate different vowels (see Figure 2) [12]. Figure 2. George “Corpsegrinder” Fisher of Cannibal Corpse rounds his lips to lengthen his vocal tract (left); Dani Filth (Daniel Lloyd Davey) of Cradle of Filth spreads his lips to make his vocal tract shorter (right). This last point raises what is perhaps the most important yet subtle technique that extreme metal vocalists have at their disposal, the manipulation of vowel formants. These vocalists will frequently sacrifice the intelligibility of lyrical content in order to exaggerate the sense of “heaviness” that can be perceived with especially low vowel formants in death metal vocals. In 3. SIGNAL PROCESSING AND RESYNTHESIS USING AUDIOSCULPT In order to investigate the aforementioned features of the metal voice and their roles in music pieces, we decided not only to perform spectrographic analyses, but also to extract parameters that allow for a partial resynthesis of key sound components in order to validate our assumptions. To explain, the following is a brief description of the basic principles underlying AudioSculpt functionalities: AudioSculpt generates a short-term Fourier transform (STFT) representation of the sound.1 In order to optimally display the sonogram image, STFT parameters must be chosen including analysis window size and shape, step size (that determines how the STFT fits over the signal characteristics), and fast Fourier transform (FFT) sampling. For each sound example in this study, we used a Blackman window of size M=2400 with a step size of M/8 and a number of channels equal to 4096. Once a satisfactory image is in view, it is ready to be analyzed. Qualitative judgments can be made based on the obtained spectrographic image but more precise measurements of vowel formant quality are also taken by first synthesizing a new sound file based on the original sound’s vowel formants using one of AudioSculpt’s several spectral gain filters (in each case during this study, the pencil filter tool). These filters change the gain of certain frequency regions defined by the user within the resynthesized signal. For this study, 1 The technical information offered here is based on [11]. _89 the regions to be filtered were defined according to a computer-based analysis of the original sound file’s partials. Specifically, AudioSculpt’s “Partial Tracking Analysis” feature was used to track partials using multiple breakpoint functions for each sinusoidal component (a procedure that works with inharmonic signals as well as harmonic ones). A new sound was then created by amplifying the formant regions of the original signal. AudioSculpt’s diapason tool was then used to take frequency readings from the synthesized partials. When the tool is pointed at a particular place on the sonogram, its frequency and amplitude are displayed, a corresponding sine tone sounds, and a two-dimensional spectral slice of the synthesized signal appears. This procedure allows for an analysis of the original sound by hypothesizing that certain components of it appear to be important, resynthesizing those components, and closely analyzing them. 4. 4.1. A Spectrographically-Informed Music Analysis In addition to the spectrogram, Figure A1 also contains phonetic and rhythmic transcriptions as well as analytical annotations provided at the bottom of the figure. As indicated by the rhythms given below the spectrogram, the vocals fit neatly into a regular 4/4 meter, indicating that the vocalist kept a regular pulse in marked contrast to his earlier performances discussed above. Partly because of this rhythmic regularity, the improvisation comes across as surprisingly organized, as though it had been deliberately crafted. An Impression of Tight Control This impression likely results from several musical features exhibited by the improvisation that are 1 An online appendix that includes larger images such as Figure A1 can be accessed at http://www.music.mcgill.ca/~depalle/ICMC2012/ICMC2 012Smialek.htm. _90 note alternations between /ɚ/ (the r “sure”) and /i/ (as in “heed”) in measure 12 are easily ɹ ɚ column and, in the case of the one exception, the /iɚ/ A RECORDED IMPROVISATION In order to investigate the features of extreme metal voices presented in section 2 in a more musical context, we asked a volunteer vocalist to perform a vocal improvisation using whatever extreme metal vocal techniques he wished. Given the somewhat artificial performance environment—the room was virtually devoid of reverberation and there were no instruments to accompany the vocalist—the volunteer vocalist’s improvisation shown in Figure A1 (see URL1), if not an exact indicator of the performance decisions a vocalist might make in a concert setting or full-band recording, can be considered an accurate reflection of the kinds of performance choices that are possible using inhaled and exhaled voices and the variations in formant frequency available with different vowel combinations. 4.1.1. frequently invoked in discussions of basic musical composition: the improvisation shows a clear binary division into two roughly equal sections of music, separated by a sudden change in dynamics (see the dynamic markings at measure 7); each half contains a recurring, slightly varying rhythm (labelled x and y) that generally does not occur in the other half; and, following the contours visible in the spectrogram, both divisions exhibit an arch-like quality of intensification whereby the vocalist’s screams steadily increase in volume and high spectral frequencies (measures 2–4, 9–11) before rapidly calming with quieter, lower vowels (measures 6– 8, 12–15). The total result is a sample of improvised extreme metal vocals which demonstrates controlled musical techniques in a regular manner, one that could be heard as loosely narrative or rhetorical in the sense that it creates regularly spaced climaxes, moments of repose, and gradual variations. If there is a clear sense of musicality to be found here, what can be inferred about the vocal techniques used to achieve it? 4.1.2. versatile rhotic (“r”) of Standard North American Inhaled vs. Exhaled Vocals Inhaled vocals in Figure A1 are shown by boxes around the notation below the spectrogram, indicating that exhaled and inhaled vocals are employed about equally during the improvisation. Though aurally distinguishable from exhales to those familiar with extreme metal vocals, the distinguishing acoustical characteristics of the inhaled vocals are not always immediately apparent from the spectrogram (at least at the resolution given in the example). What can be readily seen is a clear difference in the highest regions of spectral energy reached in measures 4–5 and 10–11. None of the exhaled portions of the improvisation come close to this region reaching upwards of 4000Hz, a circumstance which indicates that only inhaled vocals allow the vocalist to achieve the very wide spectral spread of energy characteristic of these climactic moments in his improvisation. The vocalist also appears to have reserved at least one of the vowels with especially high formant frequencies for his inhaled voice. /æ/ (as in “had”), the vowel most consistently present during these moments of intense high spectral energy and a vowel with one of the highest first formant frequencies, only occurs once as an exhale (measure 2). Lastly, it seems that nearly all the consonants with the exception of / ɹ / (and /p/ in measure 3) were reserved for exhaled vocals. Here, it would seem that the closures of the vocal tract necessary to produce some consonants prove too awkward to execute with the steady “sucking” air flow used in inhaled vocals. 4.2. A Paradigmatic Analysis of the Improvisation In order to arrive at an especially clear demonstration of the primary role that certain phoneme combinations played in the improvisation, Figure A2 provides a ɹ /ɹ / ɚ /ɹ / /ɹ / that there exists a consistency to the vocalist’s use of of the measure with /ʊ/ ( “who”) from /a/ (as in “father”) articulated /æ/ (as in “had”). It makes sense then that upper jaw’s front teeth. One’s tongue quickly touches paradigmatic analysis1 of the recording sample based primarily on rhythmic identities and secondarily on phoneme content. Although somewhat similar to a musical score, the chart parses the music into segments, usually spanning one measure each (measures are indicated by circled numbers; segments spanning two or more measures are enclosed within square brackets), and distributes them horizontally to show difference and vertically to indicate similarity. Musical time flows downwards on the chart so that as new segments of music occur in the recording, they appear in the chart beneath one another whenever a portion of their rhythms matches other segments (e.g. measures 2–6 and 11–12 have common rhythms for beats 3 and 4). Thus, a continuously repeating piece of music would appear as a single column while one that continuously introduced new material would have its segments listed diagonally. To add more columns of rhythmic identity and to avoid cluttering them, the same segmented unit sometimes appears more than once. Such is the case with measures 1–2, which is shown as a single segment, as well as measures 5, 9, 11, and 12 where each of these segments reappear in new columns to the right. Inhaled and exhaled vocals are indicated by different shades of grey horizontal boxes but not in the case of reappearing segments. Consequently, the chart can be followed in real time along with the recording by reading only the shaded segments, proceeding downwards from the topleft to the bottom-right corners. To draw attention to areas of greatest interest, solid boxes outline regions that show both identical rhythms and noticeably repeated phonemes while dashed boxes indicate rhythmic identities with little phonemic similarity. 4.2.1. ɹ Some Rhythmic and Phonetic Motives The far-left column brings into relief how the vocalist created a series of variations during the first half of the improvisation (measures 1–6). Here, he has clearly followed a pattern of varying his rhythms during beats 1 and 2 while treating beats 3 and 4 as a recurring rhythmic motive (shown by the large vertical rectangle around the common rhythms in the far left column). Indeed, this recurring rhythm coincides with each of the increases in spectral bandwidth that generate the archlike pattern visible in the first half of Figure A1. On occasion, the vocalist has punctuated the last beat of the measure with /ʊ/ (as in “hook”, marked to the right of the column with arrows), a gesture that in each case requires an elaborate motion of closing the jaw and quickly rounding the lips when moving to /u/ (as in “who”) from /a/ (as in “father”) or the similarly articulated /æ/ (as in “had”). It makes sense then that such an elaborate physical motion would be reserved for a longer sustained rhythmic value and would punctuate the final accent in a rhythmic motive. By contrast, the 1 One of the clearest introductions to paradigmatic analysis available can be found in [14]. quicker sixteenth-note rhythms that occur as variations in measures 3 and 6 are possible because the alveolar closure made by the tongue when forming the consonant /d/ can be executed quickly; similarly, the quick eighthnote alternations between /ɚ/ (the r-coloured vowel in “sure”) and /i/ (as in “heed”) in measure 12 are easily performed because they do not require elaborate jaw motions, only quick shifts between lip and tongue positions.2 4.2.2. The Importance of /ɹ / Sounds This last point raises one of the most conspicuous consistencies observable throughout the recording sample. The vocalist very frequently alternates between /ɚ/ and /i/ sounds with both inhaled and exhaled vocals. These phonemes occur for nearly all the articulations contained within the solid box given in the left-centre column and, in the case of the one exception, the /iɚ/ combination is merely displaced by a beat, and this is marked by the arrow in the left-centre column. The versatile rhotic (“r”) of Standard North American English is especially worthy of emphasis here for both the acoustical and physiological advantages it offers the vocalist. This sound does not require a stoppage in air flow so it can be used as a vowel (e.g. /ɚ/ as in “fir”) or as a consonant (e.g. /ɹ / as in “rapid”). As a result, it is especially suited to the physiological difficulties of articulating consonants with inhaled vocals (note again that nearly all of the other consonants are exhaled). Because it can be articulated with a continuous air flow, the /ɹ / can be used to slightly alter vowels. Specifically, when a vowel is rhotacized, i.e. coloured by an /ɹ /, the third formant becomes lowered [15]. Even if this third-formant lowering is not as directly tied to an impression of heaviness as the lowering of the first two formants, it nevertheless provides a way for the vocalist to create variety and, on a social-perceptual note with regards to paralanguage, it seems more than a coincidence that the /ɹ / sound is often used to imitate the snarls of wild beasts.3 Having drawn a number of inferences as to why certain patterns appeared in the improvisation, the strongest and most basic point here is that there exists a consistency to the vocalist’s use of particular phonemes in such a way that they seem fundamental to the most salient musical features of the improvisation. 2 The alveolar ridge is the sloping region located by the upper jaw’s front teeth. One’s tongue quickly touches this plain to stop the flow of air through the vocal tract when producing the consonant /d/ [15]. 3 Paralanguage can roughly be understood to include all forms of non-verbal communication. These can include facial expressions, vocal utterances such as grunts and sighs, as well as prosaic and timbral modifications to ordinary speech that shade meaning [16]. _91 5. AN EXCERPT FROM “THE VOWEL SONG” Framed as a public service promoting literacy, “The Vowel Song” by death metal band Zimmer’s Hole begins with vocalist Chris Valagao reciting the vowels of the alphabet (henceforth “letters” to distinguish from phonetic vowels) in long sustained screams, harmonized by three guitars in homorhythm (see Figure A3). The slow punctuations of each homorhythmic attack, combining voice, low power chords (shown only as roots in the example), and two harmonized lead guitars, not only lend a certain satirical grandiosity to song, they also help to create the sensation of Valagao’s unpitched screams possessing a kind of melody, drawn by a precise control of formant frequency locations. 5.1.1. Formants in Flux Although it may not be immediately evident in the spectrograms given here, there is quite a great deal of variation to the formant frequencies used in the example. In order to illustrate this variation, Figure 3 plots the position of each letter that Valagao screams within vowel space, i.e. a graph which plots vowels according to the frequency of the first formant on the y-axis and second formant on the x-axis. Of course, some of the letters Valagao screams are actually diphthongs or triphthongs. Accordingly the most steady-state vowel within each letter is identified with a dot on the graph and arrows leading up to or away from it depending on how the phonetic transitions occur. To illustrate an example, Valagao’s screamed letter “u,” represented by plot #5, is performed in such a way that it traverses vowel space beginning near /ɪ/ (as in “hit”), reaching its most steady-state point near /ɑ/ (as in “harm”), and finally moving towards the lower formant frequencies in between /ʊ/ and /ɔ/ (as in “hook” and “caught” respectively). It becomes clear how much formant variation is involved in this song introduction from only examining the steady-state plot points, let alone taking into account all the quick diphthong transitions indicated by the arrows. 5.1.2. Interactions between the Voice and Guitar If these changes in formant frequency are compared with the pitch contour of the guitar parts (which move in parallel motion), a surprising correspondence appears between the guitars’ changes of pitch direction and the movements of the first formant. As the first letter changes to the next, the lower formant decreases in frequency paralleling the descent of the guitars (see the “changes of direction” arrows in Figure A3). With the next letter, the lower formant reverses direction just as the guitar does. This pattern of alternating upwards and downwards directions, shared between the guitars and the voice’s first formant, continues until the guitars and voice break the homorhythm. What’s more, there is an even stronger correspondence between the highest lead guitar part and the voice’s first formant movements. This _92 is shown in Table 1 (see next page) which compares the high lead guitar melody with the first formant of each vowel at its point of greatest stability and sustain (the dotted points in Figure 3). Both are shown as frequency values in Hz and in terms of pitches that correspond to those frequencies. A comparison of these pitch values (including their octave position) reveals a striking relationship. With a margin of about one semitone above and below the upper guitar part, the voice’s first formant parallels the exact contour of the guitar part. Figure 3. Formant positions from the opening of “The Vowel Song” plotted in vowel space [17]. Each letter that Valagao screams is assigned a number. Arrows indicate phonetic transitions before and after a vowel is stabilized and sustained. Taking into consideration that there is usually a frequency range of around 100Hz over which each formant’s energy is significant (the values in Table 1 sample the average frequency for the first formant) and that the voice is in rhythmic unison with the guitars, it does not seem far-fetched for a listener to perceptually connect the guitar melody and the formant movements, thereby imagining a kind of melodic motion assigned to a series of unpitched screams. Even after the homorhythm breaks off, one can discern further frequency-oriented connections between the voice and its surrounding musical contexts. As the vocalist finishes preparing the last letter by screaming the words “and sometimes,” a diving upper harmonic merges with his third formant beginning with the onset of the final letter Y. In this brief intro to “The Vowel Song,” the widely taken for granted division between timbre and pitch appears especially blurred. 6. CONCLUSION Having observed extreme metal vocal techniques in both a relatively controlled recording session and at work in a commercial studio recording, it should now be clear that the extreme metal voice is far from the simplistic percussive device that it is often assumed to be. If such – – – – – EXCERPT FROM “THE VOWEL SONG” assumptions are in no small part the result of deeply entrenched habits of describing vocal music primarily in terms of pitched melodies, the extreme metal voice can serve as an invitation to approach the study of musical expression in new ways. Having taken an Upper Melody (Gtr 1) First Formant No. 1 – A C6/1050Hz C#5/540 Hz No. 2 – E G5/780 Hz G4/380 Hz interdisciplinary approach to the extreme metal voice, the results of this paper support ongoing arguments for the musicological utility of spectrograms in drawing attention to subtle means of musical expression that can easily be overlooked. No. 3 – I Eb 6/1240 Hz F5/700 Hz No. 4 – O C6/1050 Hz B4/510 Hz No. 5 – U Eb 6/1240 Hz E5/650 Hz Table 1. A comparison of frequency values between the highest lead guitar’s melody and the first vocal formant in its point of greatest stability. 7. REFERENCES [1] Middleton, R. Studying popular music. Open University Press, Philadelphia, 1990. [2] Tagg, P., Kojak: 50 seconds of television music. Towards the analysis of affect in popular music. Musikvetenskapliga Institutionen, Göteborg, 1979. [3] Tagg, P. and B. Clarida., Ten title tunes: Towards a musicology of the mass media. Mass Media Music Scholars’ Press. New York and Montreal, 2003. [4] Chester, A., “Second thoughts on a rock aesthetic: The Band”, New Left Review, 62 (1970): 75–82. ɪ r /ɑ/ between /ʊ/ and /ɔ/ [5] Cogan, R., New images of musical sound. Harvard University Press, Cambridge, 1984. [6] Brackett, D., Interpreting popular music. 2nd edition. University of California Press, Berkeley and Los Angeles, [1995] 2000. [7] Dibben, N., Review of [6]. Popular Music, 21, no. 1 (January 2002): 143–45. [8] Moore, A., Review of [6]. Music & Letters, 77, no. 4 (November 1996): 658–59. [9] Huron, D., Review of Empirical musicology, Notes: Quarterly Journal of the Music Library Association, 63, no. 1 (September 2006): 94. [10] Geslin, Y. and A. Lefevre, “Sound and musical representation: the Acousmographe software”, Proceedings of the International Computer Music Conference, Miami, USA, 2004. [11] Bogaards, N. et al., “Sound analysis and processing with AudioSculpt 2”, Proceedings of the International Computer Music Conference, Miami, USA, 2004. [12] Sundberg, J., The science of the singing voice. Northern Illinois University Press, Dekalb, 1987. [13] Lindestad P.-Å., “Voice Source Characteristics in Mongolian ‘Throat Singing’ Studied with High-Speed Imaging Technique, Acoustic Spectra, and Inverse Filtering”, Journal of Voice, 15, no. 1 (March 2001): 78–85. [14] Agawu, K., “The challenge of musical semiotics”, in Rethinking music, eds. Nicholas Cook and Mark Everist, 138–160. Oxford University Press, New York, 1999. [15] Ladefoged, P and K. Johnson, A course in phonetics. 6th edition. Wadsworth, Centage, Boston, [1975] 2011. [16] Poyatos, F., Paralanguage: a linguistic and interdisciplinary approach to interactive speech and sound. J. Benjamins, Philadelphia, 1993. [17] Beskow, J., Formant synthesis demo, http://www.speech.kth.se/wavesurfer/formant/. Accessed 17 February 2012. _93