3-D Sound Spatialization Using Ambisonic Techniques

3-D Sound Spatialization using Ambisonic Techniques

Author(s): David G. Malham and Anthony Myatt

Source: Computer Music Journal, Vol. 19, No. 4 (Winter, 1995), pp. 58-70
Published by: The MIT Press
Stable URL: http://www.jstor.org/stable/3680991
Accessed: 13-05-2018 05:27 UTC

David G. Malham and Anthony Myatt
Music Technology Group 3-D Sound Spatialization
Department of Music
University of York using Ambisonic
York YO1 5DD, UK
(dgm2, am l2)@unix.york.ac.uk Techniques

The exploration of sound spatialization is a preoc-This article examines a variety of spatializ

cupation of many composers and performers of elec- techniques within a historical and technical
troacoustic music. Two-channel stereo techniques text, identifying some of the advantages and
are widely used in the genre, but more sophisti- vantages of specific methods. This provides th
cated forms of sound spatialization are often re- basis for a review of ambisonic theory, which
additionally describes several advanced spati
stricted to those with access to significant techni-
cal resources. Recent technological developments processing methods that are possible with thi
have increased access to high-quality, multi- nique. We go on to discuss experimental res
channel audio systems and computer music work- and compositional considerations related to
stations, and new applications for sound spatializa-
of ambisonic techniques for electroacoustic
tion have been identified: virtual reality, multi- performance.
media computing, films, videos, and computer
games may all employ surround-effects of some
type. In addition, increased computing power hasHistorical Context
led to the production of workstations that can sup-
port the data bandwidth required for multiple The first recorded use of multiple audio chan
audio channels. All of these factors have led to a re-
to give a spatial effect occurred more than a
newed interest in sound spatialization techniques,tury ago. Clment Ader's use of multiple set
providing enormous potential for accelerated devel-telephone transmitters and receivers to relay
opment of the field. sound from remote events created enormous
Ambisonics is a powerful technique for sound enthusiasm at the 1881 Paris Exhibition of Electric-
spatialization. It can allow recording, manipula- ity (Askew 1981). However, no further work of any
tion, and composition with naturally and artifi- significance was done on multi-channel audio until
cially constructed three-dimensional sound fields. the 1920s, when a binaural/headphone-based sys-
In this article we will present the results of experi-
tem was developed by Harvey Fletcher and his
mentation and research into the application of am-team at Bell Telephone Laboratories (Sanal 1976).
bisonics to the composition and performance of Binaural signals are not suitable for presentation
electroacoustic music. We also present Csound im-
over loudspeakers, and further work was needed
plementations of ambisonic encoding and decoding
to produce methods appropriate for more than one
techniques that can be used on any computing plat-
form supporting four or more independent audio Bell Telephone Laboratories (Fox 1982) developed
output channels. Our hope is that these techniques
a system that used a curtain of microphones hung
will be a useful resource for those wishing to exper-
in front of the sound source and fed a similar cur-
iment with sound spatialization, and that their tain of loudspeakers. The intention was to recreate
presentation here will encourage further develop- the original sound wavefront. As we now know, ac-
complishing this requires a very large number of
channels, running into many tens or even hundreds
spaced microphones were set left, center, and right, have been the first musical use of electronic sound
and they fed similarly placed speakers. They were spatialization.
much helped by the support of Leopold Stokowski In Europe during the early 1950s, composers in-
in this work, and, indeed, excellent stereo record- volved in the new electronically based forms of mu-
ings of his orchestra are available from as early as sic (and in this we are including both the musique
1932. concrete and elektronische musik schools) became
Around the same time, at EMI in Britain, Alan interested in the possibility of using sound position-
Blumlein was working on a different approach, ing and movement compositionally (Schaeffer
based on creating a convincing illusion of the origi- 1951; Stockhausen 1956). For Gesang Der Jiing-
linge, Karlheinz Stockhausen employed five loud-
nal sound field, rather than the actual sound field it-
self. His simple, crossed pair of directional micro- speakers (or groups of loudspeakers). He intended
phones that captured directional information on four to be positioned around the audience and one
just two channels of audio is now widely regarded above. The loudspeakers around the audience were
as one of the best ways to capture an audio event to fed from a four-track tape, and the fifth (vertical)
be presented over two loudspeakers. The ideas in- signal was provided by a mono tape that was manu-
herent in this approach are also fundamental to the ally synchronized to the four-track tape. Even at
ambisonic 3-D surround sound system. the world premiere in the large broadcast studio of
These stereo techniques were little used over thethe West Deutscher Rundfunk (WDR), Cologne, it
next few years, mostly as a result of the technologi-was impossible to use this configuration; the fifth
cal limitations of the available recording systems loudspeaker had to be placed at center front. Since
(despite the advent of multi-track film-based record- the premiere, only a four-track mix has been used.
ers). The next major development was featured in Stockhausen says of this piece, "... in this compo-
Walt Disney's animated film Fantasia, produced insition, for the first time, the direction and move-
1939, again with Leopold Stokowski's involvement ment of the sounds in space [emphasis in the origi-
(Hope 1979). A team of engineers from RCA and nal] is shaped by a musician, opening up a new
Disney developed a special nine-track recorder, dimension in musical experience" (Stockhausen
based on nine separate synchronized optical record-1956).
ers. The orchestra was recorded with 33 micro- Five years earlier, Jacques Poullin was working in
phones that were mixed by (orchestral) section onto Paris on the potentiometre d'espace system for dis-
six of the tracks. The seventh track contained a tributing sound among four loudspeakers, typically
mono mix of the first six, the eighth track recorded two in front of the audience, another above them,
the sound from a distant microphone, and the and one at the rear. This allowed a performer to po-
ninth track held a metronome or click track. sition a sound by simply moving a small, hand-held
For the cinema presentation, a three-channel ver-transmitter coil toward or away from four large re-
sion of the eight-channel original was played back ceiver coils arranged around the performer in a
from a four-channel optical recorder that was syn-manner that reflected the loudspeaker positions.
chronized with the film. The intention was that The sound controlled by the performer came from
this would be played back over 90 loudspeakers one track of a special five-track tape recorder; the
spread behind the screen and around the audito- other four tracks each supplied one loudspeaker to
rium, although financial pressures meant that this provide a mixture of live and preset sound position-
was rarely done in practice. Notches on the edge of ing. It is interesting to note that this "position
the soundtrack film operated a switching mecha- tracking system" is similar to those now used in
nism that sent selected sounds to the side and rear virtual reality head-tracking systems. Pierre Schaef-
speakers. Adrian Hope indicates that church bells fer speaks of the experiments on the spatial projec-
were presented at the rear, and that the Ave Mariation of sound: "... le nouveau proc6d6 est une dia-
chorus progressed from the rear to link with the lectique du sons dans l'espace et je pense que le
solo voice at the front. We believe that this may terme de musique spatiale [emphasis in original]

lui conviendrait mieux que celui de ster6phonie," Groupe de Musique Exp6rimentale de Bourges'
or, "the new process is a dialectic of sound in space GMEBaphone.
and I think that the term spatial music could fit it
better than stereophony" (Schaeffer 1951).
One of the most significant examples of spatial Approaches to Spatial Reproduction
music was composed during the 1950s: Edgar Var-
ese's Poeme Electronique. Featured in the Philips' Electronic systems that allow sounds to be posi-
pavilion at the Brussels World Fair of 1958, it was tioned artificially in space can adopt one of two ap-
experienced by up to two million visitors. The tech- proaches. They can either attempt to mimic the
nology used was complex and cumbersome, involv- complex changes in timbre, delay, and amplitude
ing 15 tape recorders and 400 loudspeakers (Stim- that occur directly at our ears as a sound moves
son 1991), and with the might of the Philips from one position to another, or they can generate
Corporation and their engineering expertise, it was phantom images (Thiele and Plenge 1977) by ampli-
highly successful. However, as with all work of this tude profiling of feeds to multiple speakers. The
period (and indeed up until the 1970s), it lacked first approach, based on the Head Related Transfer
both a simple control system and the support of a Function or HRTF, (and used in commercial sys-
comprehensive theory of sound localization. tems such as QSound, Roland's RSS, and the Con-
Several attempts were made to design control sys- volvotron (Begualt 1994), normally requires the use
tems for the movement of sound during this early of headphones. Some variants, such as RSS, use in-
period, including systems such as that developed at teraural cross-talk cancellation (Cooper and Bauck
the West Deutscher Rundfunk in Cologne and used 1989) to enable the images to be presented over a
for Karlheinz Stockhausen's Kontakte (1960). This pair of loudspeakers. Systems using interaural
used a rotating, highly directional loudspeaker to cross-talk cancellation only work correctly in an ex-
distribute sounds between microphones. The out- tremely small "sweet spot" within the listening
puts of the microphones were then recorded and area. Presentation over headphones can be very
played back over fixed loudspeakers. As computer- good, approaching the listener's own capabilities for
based systems became available, investigations localization. This assumes, first, that personalized
started into various aspects of sound spatializa- HRTFs are used, and second, that head movements
tion that were not previously accessible. In partic- are tracked to allow the system to modify the
ular, work was done on artificial reverberation sound presented over the headphones, so that
(Schroeder 1962; Moorer 1979) and Doppler shifts sound positions remain constant with respect to
(Chowning 1971) which made a significant contri- the external space, not the listeners head. Without
bution to the understanding of distance and move- the head tracking, front-back reversals are likely to
ment cues in artificially spatialized sounds. In occur: sounds intended to be due forward appear to
1983, E Richard Moore published a description of come from behind, and vice-versa. This is a com-
a generalized model for spatial sound processing mon problem with binaural recordings, i.e., stereo
(Moore 1983), incorporated within the cmusic unit recordings made with microphones place in the
generator module, space. "ears" of a dummy head. These front-back rever-
In recent years, a number of systems offering spa- sals are known to occur even within natural lis-
tial manipulations of sounds have appeared, and tening conditions, but there they are much less
the application of computers has been of great bene- common. We use head rotation as one of the main
fit. There are both commercial systems, such as the ways to distinguish front from rear sound sources.
various forms of Dolby Surround, ambisonics, Ro- Rotation should result in the sounds moving in one
land RSS, and Q-Sound, and also more specialized direction if they are in front of the head, and in the
diffusion arrays like the UK-based Birmingham opposite direction if they are at the back. However,
Electro-Acoustic Sound Theatre (BEAST) and there appears to be no way at present to apply such

rotations to naturally produced binaural recordings. channels, such as those proposed for HDTV sys-
Of course, some sounds are very difficult to local- tems), suffer from two major problems. In all sys-
ize under any circumstances (such as electronic tems of this type, the content of an audio channel
telephone "bells"), so strategies such as head rota- is intended to provide the signal for a particular
tion do not necessarily work even within a natural loudspeaker, implying that the loudspeaker layouts
sound field. in the studio and playback venue must match ex-
Whatever the technical merits of HRTF tech- actly for the positioning of sounds to remain consis-
niques, they are unlikely to be used in the concert tent. This was clearly recognized by those working
situation until such time as someone can afford to at the Institute of Sonology at Utrecht State Univer-
equip every seat in an auditorium with its own sity, "There are four loudspeakers in this studio so
headphones, head tracker, and digital signal proces- that the user can form an impression of the sound
sor unit providing HRTF processing. It will also befield as it would be in the concert hall. The ideal di-
necessary to persuade concert goers to wear head- mensions of a multi-track studio are those of a me-
phones. Such an approach may, however, be more dium-sized concert hall" (Weiland 1975).
acceptable within the context of concerts broadcast It is, of course, rarely possible to match the size
over the World-Wide Web. Artificial creation of or shape of the final performance space when work-
sound fields using HRTFs also requires significanting in a studio. Moreover, a system of this type that
processing power, typically requiring many hun- has fewer than six channels directly feeding the
dreds of multiple-accumulate instructions per same number of speakers cannot meet the criterion
sample, for each individual sound source. of 60-degree angles between adjacent speakers, so
The two-channel amplitude panned stereo, which must inevitably suffer from unstable images in in-
we are accustomed to, is the most common ex- termediate positions. While this can be tolerated
ample of a system that generates phantom images for film or video work, where the listener's atten-
by amplitude profiling of feeds to multiple speak- tion is locked to the relatively small area where the
ers. An 18-dB difference in the levels of a signal pre-picture is presented, it is completely unsatisfactory
sented via two loudspeakers, spaced to subtend anfor the composer of electroacoustic music. The
angle of 60 degrees at the listener's position, will only systems that overcome these difficulties to
place a phantom image in the loudest loudspeaker.any real extent are the QMX system based on the
Progressively smaller differences will cause the im- work of Cooper and Shiga (Cooper and Shiga 1972)
age to move toward the halfway line between the and the ambisonic system, based on the work of,
speakers, then out toward the other speaker as that among others, Michael Gerzon (Gerzon 1972).
in turn becomes louder. The listening area where These two systems are fundamentally equivalent.
this works well is small, but larger than that for in- Control over sound positions and movements can
teraural cross-talk canceled HRTE The image, espe- either be accomplished in the studio or in the per-
cially for central sounds, becomes increasingly un- formance venue -or, ideally, both. For the com-
stable as the angle between the loudspeakers (as poser in the studio to have a reasonably good idea
seen by the listener) goes beyond the optimum 60 of how sound objects will behave in the perfor-
degrees. Once it reaches 90 degrees (as in four- mance space, a system that allows for essentially
loudspeaker quadraphonic systems!), stable central seamlessly scalable loudspeaker configurations is
images cannot be formed, especially at the sides necessary. This removes the changes produced as a
(Thiele and Plenge 1977). Image widths also tend to result of differing numbers and positions of loud-
vary with the spectral content of the sounds used. speakers in the studio and performance spaces, leav-
Such amplitude-panned systems, whether they are ing only the difference in actual acoustics to dis-
two channel as above, four-channel quadraphonic rupt the composer's intentions. In purely technical
systems, or some versions of the Dolby Surround terms, this rules out quadraphonics and related ap-
family (including systems with higher numbers of proaches, as noted above. However, in no way

should this be taken to imply that it is invalid to vertical dimension is essential if a with-height re-
treat loudspeakers and the acoustic space within play system is required. If the B-format specifica-
which they reside as instruments to be played by tions are followed, assuming suitable loudspeaker/
skilled performers. In striving to present sounds as decoder systems are used, then operation in differ-
heard in the studio, we must be careful not to ig- ent venues will be as similar as local acoustics
nore the enormous potential of the speaker acous- allow. In all other respects the two parts of the sys-
tics as an instrument, a potential that can be con- tem, encoding and decoding, are completely sep-
firmed by anyone who has attended a BEAST or a arate.

GMEBaphone concert. In recent years, the York re-

search group has been working toward hybrid sys-
tems, where both approaches can be combined. Encoding Equations

Sounds positioned in ambisonic B-format are con-

Ambisonics ceptually placed either on the surface of or withi
"unit" sphere. If the maximum radius of the soun
In our research program, we have concentrated on field is 1, sounds moved outside this sphere, i.e.,
ambisonics, the only currently available system with a radius greater than 1, will not be decoded
that comes close to achieving true scalability. This correctly and they will tend to pull to the neares
system has been fairly widely reported in the past speaker. This means that the sound source coordi-
(see, for instance, Malham and Orton 1991), but we nates must obey the following rule:
still find a considerable lack of awareness of both
(x2 + y2 + Z2) <= 1
its potential and principles.
The ambisonic surround-sound system is essen-where x is the distance along the X or front-back
tially a two-part technological solution to the prob-axis, y is the distance along the Y or left-right ax
lems of encoding sound directions (and amplitudes) and z is the distance along the Z or up-down axis.
and reproducing them over practical loudspeaker When a monophonic signal is positioned on th
systems so that listeners can perceive sounds lo- surface of the sphere, its coordinates referenced
cated in three-dimensional space. This can occur the center front position by the horizontal (A) an
over a 360-degree, horizontal-only soundstage (pan-vertical (B) (counter-clockwise) angles, subtende
tophonic systems), or over the full sphere (peri- at the listening position, are,
phonic systems). The system encodes signals in
x = cosA x cos B,
B-format which contains three channels for panto-
y = sin A x cos B, and
phonic systems and a further channel for peripho- z = sin B.
nic, i.e., with-height reproduction. These signals
convey directionally encoded information with a These coordinates are used as multipliers to pro-
resolution equal to first-order microphones duce the B-format output signals (X, Y, Z, and W)
(cardioid, figure-eight, etc.). Reproduction requiresthus,
four or more loudspeakers depending upon the re-
X = input signal x cos A x cos B,
quired reproduction (pantophonic or periphonic)
Y = input signal x sin A x cos B,
and on the size of the performance area. Practical
minima are four if the sounds are limited to the ho- Z = input signal x sin B, and
W = input signal x 0.707.
rizontal plane, and eight if height is required. It is
important to note that it is not necessary to con- The 0.707 multiplier on W gives a more even distri-
sider the actual details of the reproduction systembution of signal levels within the four channels,
during the original recording or synthesis of a and is the result of engineering considerations.
sound field. The only exception to this is that the This is particularly relevant to sound fields re-

corded with a sound field microphone or with syn- Since the rotation is about the Z-axis, W an
thesized sound fields containing many sources. remain unchanged. If the same procedure is ap
These multiplying coefficients can be used to posi- to the tilt and tumble equations we have the fo
tion monophonic sounds anywhere on the surface lowing.
of the sound field.
It is possible to manipulate whole sound fields
Tilt by Angle E
that contain many different sound sources in differ-
ent positions, including naturally recorded ones. X' = X,
We have developed the following standard defini-
tions about the way sounds move to new positions Y' = Y x cos E - Z x sin E, and
to keep the equations coherent and to minimize Z' = Y x sin E + Z x cos E.
confusion between movement types.
Tilt by Angle F
positive angle of rotation - a counter-
clockwise, or, by convention, leftward rotation. X' = Xx cos F - Z x sin F,
rotation - a circular movement about Z-axis: S = W,
the same as a counter-clockwise movement in Y' = Y, and
the horizontal plane. Z' = Xx sin F + Z x cos F.
tilt - a rotation about the X-axis: the same as
a counter-clockwise movement in the vertical These equations can be combined to produce
left-right plane. complex transformations such as rotate-tilt.
tumble - a rotation about the Y-axis. This is
the same as a counter-clockwise movement Rotate-Tilt
in the vertical front-back plane. Note that a
tumble is the same as a tilt rotated 90 degrees X' = X cos D - Y x sin D,
about the Z-axis.
Y' = Xx sin D x cos E + Y x cos D x cos E
Using these definitions it is possible, for ex- - Z x sin E, and
ample, to rotate the whole sound field around the Z' = Xx sin D x sin E + Y x cos D x sin E
Z-axis. For simplicity, consider the case of a sound
+ Z x cos E.
field consisting of a single sound source with ampli-
tude r positioned on the horizontal plane at an Many other combinations of movements are pos-
angle C from the center-front position. Given the
B-format signals for the untransformed position,

X = r x cos C, and Y = r x sin C. Ambisonics and Stereo Compatibility

If D is the angle which the sound field is rotated
from its untransformed position, we have, The ambisonic B-format signals are not directly
compatible with stereo, although it is possible to
X' = r x cos (C + D) and generate a response exactly equivalent to that of a
Y' = r x sin (C + D), crossed pair of microphones (Farrah 1979). How-
where X' and Y' are the transformed B-format ever, there is a British two-channel system (known
as UHJ) that allows most of the horizontal informa-
Simplifying and substituting X and Y, tion from the B format W X, and Y signals to be
matrix encoded to form a standard two-channel
X' =Xx cos D - Y x sin D, and stereo signal. The mono and stereo compatibility of
Y' = Xx sin D + Y x cos D. UHJ recordings is very good; well-recorded UHJ pro-

Malham and Myatt 63

four or more horizontally placed loudspeakers can above results in a cardioid source directional re-
reproduce virtually all of the horizontal positional sponse for each loudspeaker. This is optimum for
information contained in a full B-format signal. listening positions close to the loudspeakers or out-
This involves designing wide-band, 90-degree phase side the loudspeaker array (Malham 1992). Where
shifters for both encoding and decoding (Gerzon the intended listening area is significantly smaller
1977a, 1977b). than the speaker array, a more hypercardioid shape
can be employed by increasing the directivity fac-
tor, which results in improved imaging for centrally
Decoding Ambisonics located listeners.
Decoders based on these equations can easily be
Decoding ambisonically encoded signals can appear built with simple operational-amplifier circuitry,
complicated. The complexity appears in the optimi- and it is possible to implement the cubic design by
zation of decoders for British systems (with a lim- setting up an eight-in, eight-out mixing desk to pro-
ited number of loudspeakers and a small listening duce suitable decoding. Loudspeakers and amplifi-
area) that use psychoacoustic techniques, but these ers used in the array should all be similar in size,
are not productive in systems used to cover large response, and output.
areas (Malham 1992). Designing decoders for large
areas, such as concert halls, requires a design strat-
egy aimed at achieving even power distribution.
The distribution of loudspeakers should be as even Compositional Applications
as possible; recognized loudspeaker configura-
tions include squares and regular hexagons for In the past, the availability and expense of the hard-
horizontal-only work, and a cube as the practical ware required to realize spatial compositions has
minimum for with-height work. Each individual often forced composers to position sound in one di-
speaker is then fed a combination of the B-format mension only, and hope that further possibilities
signals corresponding to its position with respect to might present themselves when the piece is per-
the center of the array. For instance, for a square formed. Many composers use the careful control of
(horizontal only) array of four speakers, arranged reverberation, phase, and amplitude to introduce ad-
left front (LF), right front (RF), left back (LB), and ditional spatial cues within the stereo field. Some
right back (RB), the signals are: compositional expertise exists in creating stereo
tapes that can recreate these effects in concert per-
LF = W + 0.707(X + Y), formances (Wishart 1986), but inaccuracies often oc-
RF = W + 0.707(X - Y), cur where the acoustic conditions of the playback
LB = W + 0.707(-X + Y), and venue introduce additional spatial cues (Smalley
RB = W + 0.707(-X - Y). 1992). This type of problem, coupled with the slow
For a cubic array, the signals for the four planar cor-
introduction of spatialization technology in domes-
ners in the "up" (U) and "down" (D) planes are: tic sound-reproduction systems, has restricted the
use of space as a compositional parameter, and of-
LFU = W + 0.707(X + Y + Z), ten discouraged composers from considering multi-
RFU = W + 0.707(X - Y + Z), dimensional sound localization. Electroacoustic
LBU = W + 0.707(-X + Y + Z), composers recognize that sounds can contain or im-
RBU = W + 0.707(-X - Y + Z), ply movement (Smalley 1986), but few have been
LFD = W + 0.707(X + Y- Z), able to fully explore the compositional implica-
RFD = W + 0.707(X - Y- Z), tions and developments of this, specifically, where
LBD = W + 0.707(-X + Y - Z), and movement or the position of sounds is inherent in
RBD = W + 0.707(-X - Y - Z). a piece and all its performances.

Our hope is that ambisonic technology will be a nying this article. The number of sound sources re-
first step in widening composers' access to spatiali- produced within the sound field can exceed the
zation. We present below a summary of techniques number of loudspeakers used to reproduce it, and
appropriate to recordings encoded using this tech- may be greater than the number of recorded tracks
nique, along with a Csound implementation of used to represent it. Each sound within a synthe-
B-format encoding and horizontal plane decoding. sized sound field can be encoded with movement.
These can be used on computer platforms that sup- One of the great benefits of ambisonic encoding is
port the output of four-channel sound files. We be- that B-format signals can be mixed together to pro-
lieve that ambisonics represents a potential stan- duce a resultant sound field that retains the posi-
dard for positional encoding techniques that will tional information of all its components. Compos-
enable compositions with spatial information to be ers can manipulate the spatial path of individual
performed on a range of simple loudspeaker con- sound sources, and mix them with further encoded
figurations without specialized hardware. sources to combine a series of different motions.
The compositional possibilities described here When sounds have been ambisonically encoded,
are based on the definitions of ambisonic theory a decoding process will reproduce the position and
(Gerzon 1972) and experiments by various compos- movement of sounds within the sound field if suit-
ers at the University of York Department of Music. able decoder and loudspeaker configurations are
Work in ambisonics is also being conducted at the used. A sound that has been encoded to move
Australian Center for the Arts and Technology (Ven- around the listener will do so in all recognized loud-
nonen 1994). speaker configurations, from a small, single-
The ambisonic formats that are currently imple- listener environment to a large concert-hall perfor-
mented allow full three-, two- and one-dimensional mance system. The relative position of sounds and
reproduction, using B-format, horizontal decoding the scale of the movement will depend on the dis-
of B-format, and UHJ, respectively. The UHJ encod- tance between loudspeakers, i.e., the absolute
ing can also reproduce two-dimensional sound- sound position is not necessarily encoded, and a
scapes. The full three-dimensional encoding sound encoded to move across the front of the im-
system, B-format, can be used for all implementa- age will always do so despite the distance between
tions, as it can be converted to the lower formats. the front speakers. Playing back a B-format signal,
Sound field microphones (Farrah 1979) can be with an appropriate loudspeaker configuration, will
used to provide three-dimensionally encoded automatically reproduce all encoded position and
sound-source material for composition. Mimetic movement information.
source material (Emmerson 1986) and its spatial Composers who have access to playback systems
content can be captured in this way for composi- using four loudspeakers can only monitor a two-
tional use, and it can also be combined with sound dimensional sound field plane. This type of system
material with artificially encoded spatial informa- is capable of B-format playback, but not full with-
tion. Additionally, the X, Y, Z, and W signals can height reproduction. It is possible, however, to mon-
be manipulated to post-process environmental re- itor the horizontal and vertical planes individually
cordings (effectively changing the original micro- for a B-format signal designed to include height
phone position); zoom into sounds within the cap- encoding.
tured landscape; or subject the entire sound image
to rotation, tumbling, or other time-dependent mo-
tions. Considerations
Any source material without spatial encoding,
such as synthesized sound or mono recordings, canAmbisonic systems use decoding methods that are
be positioned or moved within an ambisonic soundbased on physical and psychoacoustic positional in-
field when subjected to the type of encoding pro- formation to reproduce sound fields, but there are
cess illustrated in the Csound examples accompa- further matters of perception, sound localization,

Malham and Myatt 65

erating synthesized sound fields. that visual and aural acuities are equal. Complex
Our experiments with ambisonic playback sys- sound trajectories that can be visualized (and no-
tems show that the spatial perception of a sound is tated) are often impossible to perceive. Further as-
highly frequency dependent. Some localization of sumptions about aural perception, influenced by a
low-frequency sound is possible, but the strong po- familiarity with multiple microphone (and multi-
sitional cues are provided by higher spectral compo- track) recording techniques and stereo reproduc-
nents. Also, sounds that have a widely distributed tion, also produce certain expectations related to
spectral energy can be localized more easily than sound localization. These methods present sounds
can narrow band signals (consequently, sounds as point sources, usually within a stereo field,
with sharp attack characteristics are usually easy which is rarely a true representation of the spatial
to locate). There are certain conventions and characteristics of the source. Three-dimensional re-
expectations for the localization of high- and low- cording techniques represent sound sources and
frequency sounds based on our experience of sound their spatial positions, including diffuse sound
in the environment (Begault 1994): high-frequency sources.

sound normally occurs above us, and low-frequency Several experiments have shown tha
below us (or at ground level). Inverting these rela- sound fields played back over recogniz
tionships often results in poor localization, unless speaker configurations can be perceive
other cognitive sound cues exist. The acoustic re- side of the array (Malham 1992). Here
sponse of the playback venue can produce spurious can "look in" on sound positions and
positional information that can also make localiza- rather than being surrounded by the s
tion difficult. Additionally, locating moving sounds This has several possible implications f
is easier than locating stationary sounds, as with gration of ambisonic and traditional so
all playback systems that use phantom images. diffusion systems. Further work is be
Doppler shifts can be very important to the per- to determine the possibilities of such s
ception of movement (Dodge 1985). If a sound the potential sound field distortion ef
source which does not contain inherent clues to would be introduced by the use of ind
its movement is moved within an artificially con- loudspeakers. Several ambisonic speak
structed sound field, its perception can be impeded. have been tested for use as traditional
tems using two-track tapes and hardw
Doppler shifts are not necessarily required in all cir-
cumstances, but they can assist in highlighting the signed to position the two channels of
movement of some sounds. source at different points within the
The most difficult problem in synthesizing or ma- (Malham 1992). This has some advant
nipulating sound fields is the dominance of visual placing the stereo image in specific lou
perception (Begault 1994). This acousmatic prob- because smooth spatial transitions be
lem can be acute in ambisonic loudspeaker config- loudspeakers can be achieved even if th
urations, because the phantom image technique tioned at angles greater than 60 degree
allows large angular distances to be subtended be-tener. This method is not a parallel to
tween loudspeakers at the listening position. Themance practice of sound diffusion arti
absence of a visual sound source is difficult for not reproduce the accuracy of sound p
some listeners, but reducing visual dominance byachieved by using very large arrays of
lowering light levels has been shown to increase
the perception of aural localization. This problem
rarely occurs if the angular distance between loud- Do It Yourself
speakers is small.
Visualizing the movement of sound may also beThe Csound orchestra and score files p
a problem. Compositional processes that use Figures 1, 2, 3, and 5 enable simple am

Figure 1. A Csound orches-
tra definition for encoding
B-format ambisonic data.

coding and decoding with computer systems that

support the playback of four-channel audio files. ;* Ambisonic Encoding Orchestra *
The files produce signals that can be fed directly to
;* p4 = pitch; p5 - amplitude; *
four amplifiers and loudspeakers to reproduce am-
;* p6 = start angle from center front *
bisonically encoded, two-dimensional horizontal ;* p7 = end angle from center front *
sound fields. The encoding example is based on the ;* p8 = start angle from horizontal *
unit sphere, and enables sound to be positioned and ;* p9 = end angle from horizontal *
moved on its surface. It does not include the algo- S* p*
rithms required to move sounds through the center ;* (all angles are expressed in radians)*
of the sound field.

; Csound orchestra header

sr = 44100
The Csound Examples kr = 441
ksmps = 100
In the Csound ambisonic encoding orchestra file nchnls = 4
of Figure 1, the position of a mono sound source
within the sound field is described by the hori- instr 1
zontal and vertical angles (in radians) it subtends at ; linear transformation
the listening position. These angles are represented ; between start and end angles
by parameters specified by the user in the score file kone line p6, p3, p7
of Figure 3. Any score-file event can be assigned ei- ktwo line p8, p3, p9
ther a static position or a linear movement within
; envelope for sound source
the sound field. Both the horizontal and vertical
kenv linen p5, 0.008, p3, 0.02
angles can be assigned "start" and "end" positions a5 oscili kenv, cpspch(p4), 1
(parameters p6 and p8, and parameters p7 and p9, re-
spectively). These parameters are passed from the ; calculate cos and sin of
score file to the orchestra file (Figure 1) during a ; time-varying angles
Csound performance. The encoding orchestra uses kca = cos(kone)
a linear function (line) to interpolate between both ksa = sin(kone)
start and end angles over the duration of each score kcb = cos(ktwo)
ksb = sin(ktwo)
event, producing two time-varying angles: kone and
ktwo. If the start and end angles in the score file
; B-format encoding equations
are equal, the sound will remain stationary. ax = a5 * kca * kcb
The orchestra calculates the sine and cosine of
ay = a5 * ksa * kcb
the time-varying angles, which are then applied to az = a5 * ksb
the ambisonic B-format encoding equations, produc- aw = aS * 0.707
ing the audio-rate variables ax, ay, az, and aw. The
sound source for these examples is created by a ; B-format output
Csound generator routine which produces a com- outq ax, ay, az, aw
posite waveform consisting of weighted sums of si-
nusoids (additive synthesis). This routine is called
by the first line of the score file, and generates a
wavetable that is read by the interpolating oscilla-tion or movement. The four audio signals produ
tor in the orchestra file. The output of the oscilla-by the encoding equations are written out to a
tor, variable a5, is passed to the Ambisonic encod-sound file by the outq routine. The output from
ing equations, where it is multiplied by the Figure 1 is an ambisonic B-format signal. If the
(varying) coordinates for the intended sound posi-B-format file is to be used in conjunction with t

Malham and Myatt 67

This content downloaded from on Sun, 13 May 2018 05:27:24 UTC
Figure 2. A Csound orches- Figure 3. A Csound score
tra definition for decoding file to demonstrate the
sound files encoded with encoding instrument of
B-format ambisonic data. Figure 1.

.*************************************** ;***************************************
;* *
;* Ambisonic Decoding Orchestra * Ambisonic Encoding Score *
.w I

.wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww ;* p4 = pitch; p5 = amplitude; *

;* p6 = start angle from center front *
;* p7 = end angle from center front *
sr = 44100
;* p8 = start angle from horizontal *
kr = 441 ;* p9 = end angle from horizontal *
ksmps = 100 ;* *
nchnls = 4 ;*(all angles are expressed in radians)*
? ****
* *

instr 2 .if~~~~~fk~~~ff~k~~k~

; Read sound in from file

ax, ay, az, aw soundin 1 ; Function table
fl 0 1024 10 1 .7 .7 .7 .7 .7 .7 .7 .7 .7
.4 .4 .4.4 .3 .3 .3 .3 .2 .2 .2 .2
; Decode equations producing 4
; speaker feed signals for
; Note commands
; horizontal-only playback
;pl p2 p3 p4 p5 p6 p7 p8 p9
al = aw + (ax*0.707) + (ay*0.707)
il 0.0 30.0 5.07 15000 0 6.2832 0 0
a2 = aw + (ax*0.707) - (ay*0.707)
a3 = aw - (ax*0.707) - (ay*0.707) il+ +
ii 5.0
0.16 . . . . ...
0 0
a4 = aw - (ax*0.707) + (ay*0.707)
il + . . . > >
il + . . . > >
; output speaker signals il + . . . > >
outq al, a2, a3, a4 il + . . . > >
il + . . . > >
il + . . . > > .
il + . . . > > .
il + . . . > >
decoding orchestra of Figure 2, the outputil sound
+ . . . > > . .
file name should be soundin. 1. Any mono ilaudio
+ . . . > >
signal can replace the synthesized sound source
il + of . . . > > .
Figure 1 (a5). il + . . . > > . .
il +
The score file of Figure 3 gives three examples of . . . > > .
a circular movement (counter-clockwise)iiaround
+ . . . 6.2832 6.2832
; End
the sound field: A slow rotation from center front,
over 30 sec; a quicker rotation lasting for 5 sec; and
the rotation of a repeating short sound. These ex-
amples are intended to illustrate movement tra in
file,theand the use of the score based linear "ra
horizontal plane, so no angles above or below the
ing" function to produce intermediate sound pos
horizontal plane are given. The verticaltions.
angles can
be used to produce height-encoded B-format, Figure
but2 is a Csound orchestra that decodes a
this information will not be decoded by the orches- B-format signal to produce speaker-
tra file of Figure 2. For this simple illustration
feed signalsof for a horizontal loudspeaker configura-
movement, the pitch of the sound sourcetion. A B-format sound file is taken into the orches-
tra as a soundin
constant. The score file shows how parameters canfile. Each of the four channels is
be passed to interpolating functions in the applied
orches-to the ambisonic decoding equations and

Figure 4. Loudspeaker ar- Figure 5. A Csound score
rangement for playing file to demonstrate the
back four-channel ambi- decoding instrument of
sonic sounds. Figure 2.

scaled by the directivity factor of 0.707 to correctly al a2

position the sounds. The four speaker signals pro-
duced, al, a2, a3, and a4, are written to a sound file Front

that can then be played over the speaker configura-

Speaker separation, d
tion of Figure 4. The score file (Figure 5) that ac-
companies this orchestra acts as a simple switch to
trigger the decoding instrument. Care should be
taken to ensure that the duration in this score, pa-
rameter 3, is equivalent to (or greater than) the in-
put sound file length.
The examples presented here demonstrate simple d d

sound positioning. More complex interpolating

functions could be applied to describe sound paths
within the sound field. The encoding functions can
be added to any Csound instrument if ambisonic
B-format spatialization is required. The decoding
orchestra file can be applied to any B-format signal
(including recordings made with the sound field mi-
crophone) to produce horizontal loudspeaker feeds.
No additional hardware is required.
a4 a3
Two stereo reproduction systems, with approxi-
mately equal gain, can be used for the loudspeaker
Figure 4
configuration of Figure 4 (it is useful to produce a
four-channel sound file to test the amplitude and
position of each loudspeaker). The loudspeakers
should be placed accurately at the four corners of a Ambisonic Decode Score.
Full three-dimensional sound field decoding from
B-format signals is possible using computer sys-
tems that support eight audio channels, or by em-
; Play a test note on instrument 2
ploying a hardware decoder. i2 0 38

; End
Conclusions e

Figure 5
Our experimental work has been carried out using
both software and hardware ambisonic implemen-
tations, including a purpose-built programmableand generation are almost unique to this tech-
periphonic decoder that produces full three- nique. There is also the potential to convert ambi-
dimensional surround sound over 16 loudspeakers. sonic signals to other formats, e.g., binaural, for
The Csound examples are presented here to enable other applications (Malham 1993).
other users to experiment with ambisonic encoding The techniques described here only refer to first-
and decoding, without using specialized hardware order ambisonic encoding. The original work by
devices. The simplicity with which ambisonicGerzonen- (Gerzon 1972) includes descriptions of
coding can represent sound sources in three- higher-order systems that can increase the spatial
dimensional space is a great advantage, and some accuracy of both recording and playback.
of the possibilities for sound field manipulation There are many musical and psychoacoustic is-

sues that require further investigation. We hope tute of Acoustics 14(5):209-216. St. Albans: Institute of
that the information presented here will encourage
progress in these areas. Malham, D. G. 1993. "3-D Sound for Virtual Reality us-
Further information can be obtained on the ing Ambisonic Techniques." In Proceedings of the 3rd
Annual Conference on Virtual Reality (addendum).
World-Wide Web ambisonic home page, via the
Westport: Meckler.
URL "http://www.york.ac.uk/insts/mustech/
Malham, D. G., and R. O. Orton. 1991. "Progress in the
3d_sound/ambison.htm." Application of 3-Dimensional Ambisonic Sound Sys-
tems to Computer Music." In Proceedings of the Inter-
national Computer Music Conference (ICMC). Mon-
References treal: ICMC, pp. 467-470.
Moore, E R. 1983. '"A General Model for Spatial Pro-
Askew, A. 1981. "The Amazing Clement Ader." Studio cessing of Sounds." Computer Music Journal 7(3):6-15.
Sound 23:9-11. Moorer, J. A. 1979 'About this Reverberation Business."
Begault, Durand R. 1994. 3-D Sound for Virtual RealityComputer Music Journal 3(2):13-28.
and Multimedia. Boston: Academic Press. pp. 65-66, Sanal, A. J. 1976. "Looking Backward." Journal of the
84, and 191-245. Audio Engineering Society 24(10):832.
Chowning, J. 1971. "The Simulation of Moving Sound Schaeffer, P. 1951. "Journal d'Orphee." In E Bayle, ed.,
Sources." Journal of the Audio Engineering Society Pierre Schaeffer l'ceuvre musicale. France 1990. Paris:
19(1):2-6. Reprinted in Computer Music Journal INA/GRM and Librairie SEGUIER.
1(3):48-52. Schroeder, M. R. 1962. "Natural Sounding Artificial Re-
Cooper, D. H., and J. L. Bauck. 1989. "Prospects for verberation." Journal of the Audio Engineering Society
Transaural Recording." Journal of the Audio Engi- 10(3):219-223.
neering Society 37(1/2):3-19. Smalley, D. 1986 "Spectro-Morphology." In S. Emmerson,
Cooper, D. H., and T. Shiga. 1972. "Discrete Matrix ed., The Language of Electroacoustic Music. London:
Multi-Channel Stereo." Journal of the Audio Engi- MacMillan, pp. 73-80.
neering Society 20(5):346-360. Smalley, D. 1992 "The Listening Imagination." In H. O.
Dodge, C., and T. A. Jerse. 1985. Computer Music. New Paynter, et al., Companion to Contemporary Musical
York: Schirmer Books. pp. 245-247. Thought, vol 1. London: Routledge.
Emmerson, S. 1986 "The Relation of Language to Materi- Stimson, Ann, 1991. "The Script for Poeme Elec-
als. In The Language of Electroacoustic Music. Lon- tronique; Traces from a Pioneer." Proceedings of the In-
don: MacMillan, pp. 17-39. ternational Computer Music Conference. Montreal:
Farrah, K. 1979 "The SoundField Microphone." Wireless ICMC, pp. 308-310.
World. November: 99-103. Stockhausen, K. 1956. "Programme Notes for the 1956
Fox, B. 1982. "Early Stereo Recording." Studio Sound World Premiere of Gesang Der Jiinglinge." In liner
24(5):36-42. notes for the 1992 CD Stockhausen 3 Elektonische
Gerzon, M. A. 1972. "Periphony: With-Height Sound Re- Musik 1952-1960 Stockhausen: Verlag.
production." Journal of the Audio Engineering SocietyThiele, G., and G. Plenge. 1977. "Localization of Lateral
21(1):2-10. Phantom Sources." Journal of the Audio Engineering
Gerzon, M. A. 1974 "Surround-Sound Psychoacoustics." Society 25(4):196-200.
Wireless World. December: 484. Vennonen, K. 1994. '"A Practical System for Three-
Gerzon, M. A. 1977a. "Design of Ambisonic Decoders for Dimensional Sound Projection." In Proceedings of the
Multi Speaker Surround Sound." Paper presented at the Symposium on Computer Animation and Computer
58th Audio Engineering Society Convention, 4 Novem- Music. Canberra, Australia: Australian Centre for the
ber, New York. Arts and Technology.
Gerzon, M. A. 1977b. "Surround Sound Decoders" (7 Weiland, E C. 1975. "Electronic Music-Musical Aspects
parts). Wireless World. January to August issues, 1977. of the Electronic Medium." Internal publication, Insti-
Hope, Adrian. 1979. "Fantasia-Multitracked." Studio tute of Sonology, Utrecht State University.
Sound 21(8):29-30. Wishart, T. 1986 "Sound Symbols and Landscapes." In
Malham, D. G. 1992. "Experience with Large Area 3-D S. Emmerson, ed., The Language of Electroacoustic
Ambisonic Sound Systems." In Proceedings of the Insti-Music. London: MacMillan, p. 45.

