Proceedings of the Institute of Acoustics
WHAM: TO ASYMMETRY AND BEYOND!
M Dring
Dr B Wiggins
1
College of Science and Engineering, University of Derby, Derby, UK
College of Science and Engineering, University of Derby, Derby, UK
INTRODUCTION
Spatial hearing is our ability to make sense of the complex and multiple acoustic cues provided to our
ear/brain system by the world around us. Sounding objects have a direction, location and sound of
the place where they occur and, combined with the dynamic nature of our constantly changing
position, give a rich source of data for the world around us. The reproduction of sound can benefit
greatly, in many situations, from an accurate dynamic simulation of this complex acoustical scenario,
or auralisation. Auralisation of acoustic spaces is a tool used in many industries. To provide a truly
representative result, the systems used must capture and deliver critical, dynamic, psychoacoustic
cues that react to the listener’s head position. The WHAM (Webcam Head-tracked Ambisonics)
website (www.brucewiggins.co.uk/WHAM) utilises webcams to provide auralisation that reacts to
head rotation via the browser using standard HRTF data; visitors to the site can experience very high
order horizontal only Ambisonic to binaural presentation of room responses. In its initial inception,
orders were limited to 7th order asymmetry for the final binaural presentation, which previous research
has shown to fall below a transparent perceptual threshold compared to a system limited to 31st
order1. This paper documents the developments to deliver beyond 7th order and improvements in
functionality made to the WHAM website and open-source JS Ambisonics software library, that
continue to make it a useful remote resource for acoustic auralisation purposes.
2
BINAURAL TECHNOLOGIES
In this study we are involved with the binaural presentation of the sound field seeking to maintain the
inter-aural cues that convince the listener that sound sources are positioned at a known point within
a specific environment. The technology and processes used to capture and recreate the sound field
will influence the subjective response from the listener, especially if they become more aware of the
‘system’ than the audio experience.
2.1
AMBISONIC MICROPHONES
A 360 degree (periphonic) acoustic signature of a room or environment can be achieved using
specialist Ambisonic microphones, capable of capturing the spherical harmonic composition of the
sound field2. Invented by Gerzon and Craven3, in 1978 the first SoundField microphone was launched,
constructed using spaced cardioid microphone capsules to receive 4 signals (known as A-Format).
These input signals were transformed using dedicated hardware into 4 channels of B-Format also
known as First Order Ambisonics (FOA) to represent the complete sound field with a minimum of 8
speakers when requiring the presentation of height4. Upon reproduction, unlike some other audio
conventions, Ambisonics can be decoded to different speaker layouts and be manipulated to provide
rotation, tumble and tilt of the 3D sound field. Furthermore, Ambisonics can be efficiently converted
into a binaural signal, suitable for headphone listening, through convolution with Head Related
Transfer Functions (HRTFs)5,6.
Ambisonics being a hierarchical format means that both the number of inputs for signal capture and
the number of outputs for reproduction can be expanded. Higher Order Ambisonic (HOA)
microphones are now commercially available, examples include the Zylia ZM-17 (3rd order) and the
MH-Acoustics Eigenmike8 (4th order).
An increase in order and therefore the subsequent order of spherical harmonics able to be captured
is associated with an improvement in spatial resolution and the effects of spatial aliasing. The
perceptual impact of a reduced spatial resolution will be a smearing of the sharp peak of the plane
Vol. 43. Pt. 2. 2021
Proceedings of the Institute of Acoustics
wave in a wide range of directions9. Spatial aliasing will limit the frequency bandwidth needed to
correctly reproduce the interaural cues used to determine direction of arrival but also the environment
the sounds have occurred in5,10. Whilst current HOA microphones make for a quick and convenient
method to capture the sound field, the described limitations make this method undesirable for the
intentions of this project.
2.2
HEAD AND TORSO SIMULATORS
To extend beyond the currently limited Ambisonic microphone orders the sound field or binaural room
impulse response (BRIR) can be captured using a Dummy Head or Head and Torso Simulator
(HATS). BRIRs can be captured by a HATS in two ways, the first, is keeping the head static and
measure multiple source positions around the head. The second, is maintaining a static source
position and rotate the HATS. In anechoic situations rotating the head or rotating the source around
the head result in the same filters. In echoic situations, though, they are not the same. Using either
of these methods means the head related transfer functions (HRTFs) associated with the dummy
head cannot be separated from the room response and the reported benefits of HRTF personalisation
cannot be applied at a later stage in the binaural reproduction process. However, with head-tracking
devices, rotation, tumble and tilt can be implemented which research shows negates the cone of
confusion issues and supports externalisation11,12.
To achieve head-tracking in the reproduction stages either the dummy head is rotated for a static
sound source, or the sound source is moved around the dummy head. If a greater number of BRIRs
are generated, indicative of more discrete measurement points, a higher order of spherical harmonics
can be leveraged to represent the sound field. As an example, for spherical reproduction at orders up
to and including 35th, then 1296 ambisonic/spherical harmonic channels would be required according
to equation (1), with binaural presentation doubling this value for a left and right pair.
𝐻𝐻𝐻𝐻𝐻𝐻 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = (𝐻𝐻𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 + 1)2
(1)
HOA presented binaurally can be computationally intensive due to the number of convolutions
needed which can lead to slow render times and audio presentation errors such as glitches or
distortion if sufficient computing power isn’t available. Various researchers seek techniques that
reduce the order yet maintain the desired spatial attributes without generating perceptual errors. A
commonly considered technique is to observe the resolution required for the direct sound differently
from that required for the reverberant field, adopting a “hybrid” approach13. Whilst these techniques
capitalise on the studied perceptual thresholds of the reverberant/diffuse field through separation at
a determined ‘mix time’, there is admission that in real rooms, ideal diffusion (and therefore a perfect
mixing time) will never occur due to known occurrences of modal behaviour, proximity to room
boundaries and non-uniform distribution of absorption14.
The Reverberant Virtual Loudspeaker (RVL) technique documented by Engel and Henry13 offers a
solution to minimise the number of required convolutions for head-tracked BRIR presentation.
However, the room and room reflections are “head-locked” and don’t track with head rotations which
leads to misaligned auditory cues between the direct and reverberant sound fields. Room positions
which exhibit significant diffusion may mask this misalignment and Engel and Henry conclude that
RVL, which renders the reverberation at low orders (max 4th order), is subjectively comparable to ‘less
flexible Ambisonic approaches’. However, this study positions the head in the centre of the room
where generally sufficient diffusion and room symmetry is likely due to maximum path differences
from all boundaries.
However, if the aim is to generate auralisations that aren’t compromised in any detail for any listening
position in the room and provide dynamic rendering that ensures early and late reflections are truly
relative to the source position and position of the head on playback, then Very High Ambisonic Orders
(VHOA) should be implemented. As an audio format analogy, it is the equivalent of rendering a WAV
file compared to an MP3, the complexities of perceptual thresholds for each individual recording
cannot be determined with absolute certainty, therefore we should avoid making any comprise.
Vol. 43. Pt. 2. 2021
Proceedings of the Institute of Acoustics
For this project the KEMAR15 head and torso simulator was employed to capture one hemi-anechoic
(Hemi-anechoic chamber, University of Derby) environment and two reverberant environments
(MS015, University of Derby and St. Paul’s Church, Derby) in the horizontal plane only. The HATS
was rotated on a turntable every 2.5 degrees for a single loudspeaker position (144 BRIRs), which
leads to a possible 71st order of circular harmonics for binaural rendering. The automated capturing
process known as Dynamic Binaural Reverberation Acquisition Technique (D-BRAT), previously
referred to as “Binaural Scanning”, has been documented by these authors in a previous paper16.
Whilst limited to a fixed position of a single sound source and generic HRTFs this technique ensures
listeners will experience a sound field that accurately reflects the dynamic cues experienced in real
life.
3
THE IMPORTANCE OF ASYMMETRY
A typical criticism of binaural rendering is having sound sources located ”in the head” or poorly
localised (mainly front/back confusions), which is attributed to non-individualised HRTFs18; a known
limitation of D-BRAT. It is therefore important that other processes, namely natural reverberation19,
dynamic cues, and asymmetrical pinnae20 are capitalised upon that will maximise the auralisation
experience; put simply we need to ensure a decorrelation of signals arriving at the left and right ear
occur.
Another technique adopted to increase computational efficiency of binaurally reproduced signals is
to assume HRTF (and by extension BRIR) left and right symmetry5. When this assumption is made
the number of convolutions per spherical harmonic is reduced, as front/back components are the
same, and left/right components are simply phase inverted17. However, any assumption of symmetry
will erode the desired decorrelation and is unrealistic for most real rooms considering all the impacts
on ideal diffusion mentioned previously. It is therefore of importance as will be discussed later in the
paper that signals fed to the left and right ear are presented identically to how they have been
captured. BRIRs will rarely, if ever, be left/right symmetric.
In addition to the asymmetrical convolution processes at the end of the signal chain, research shows
the utilisation of asymmetrical pinnae will have a positive effect on ‘externalisation’20. D-BRAT makes
use of the KEMAR HATS, this is known to use pinnae that are not symmetrical as they are based on
real measurements of human dimensions. Figure 1 and Figure 2 show the pinnae used for this project
and provide clear evidence of the differences that will lead to decorrelation of not only the reverberant
field but also the direct sound source.
Figure 1: Front profile of the KEMAR pinnae
Vol. 43. Pt. 2. 2021
Figure 2: Side profile of the KEMAR
pinnae
Proceedings of the Institute of Acoustics
4
JS AMBISONICS: DEVELOPING 15TH ORDER ASYMMETRY
JSAmbisonics, created by Politis and Poirer-Quinot, is a “JavaScript library that implements a set of
objects for real-time spatial audio processing, using the Ambisonics framework. The objects
correspond to typical ambisonic processing blocks, and internally implement Web Audio graphs for
the associated operations.” It’s maturity, features and the fact that it is open source, made it an ideal
candidate for use in the WHAM project (the other contender being Omnitone from Google
https://googlechrome.github.io/omnitone/). JSAmbisonics is designed to work using the WebAudio
API, a specification developed by W3C describing a high-level JavaScript API for processing and
synthesising audio on the web, which is supported by major browsers on computers and mobile
devices allowing for a low barrier to entry for users.
The JSAmbisonics library already included much of the functionality and classes needed for the
project with the main classes of interest being:
• monoEncoder2D: used to encode a sound source into an ambisonic stream of a set order,
with real-time control of the panning direction.
• binDecoder2D: implements an ambisonic to binaural decode, using user-defined HRTFbased filters up to 15th order (horizontal only and assuming left/right HRTF/head symmetry)
• orderLimiter2D: takes a Higher Order Ambisonic stream of order N and outputs the channellimited HOA stream of order N’<=N (up to 15th order, 2D only)
Currently, the Web Audio API dictates that browsers will support ‘at least 32 channels’. This has
meant that browsers support 32 channels, and no more, and is where the 15th order limitation for
horizontal decoding comes from (15th order needing 31 Ambisonic channels and the highest order
that can be implemented below 32 channels). As mentioned above in section 3 all binaurally decoded
implementations of Ambisonics assume that the head and ear are left/right symmetric. This reduces
the number of convolutions needed to 1 per spherical harmonic. However, for this project, where
BRIRs are captured and utilised, left/right symmetry cannot be assumed as the room response will
not exhibit this feature and, as discussed above, asymmetry has been shown to be of importance
even for anechoic HRTF data. If the HRIR/BRIR data isn’t left/right symmetric, then the resultant
spherical harmonic filters will be different for the left and right ear streams. For this reason, a number
of JS Ambisonics’ functions needed to be updated to allow for the processing of two 31-channel
convolution streams simultaneously.
When the website is accessed, a number of audio files and filters are pre-loaded for use:
• An audio file used in the auralisation (a mono .wav file)
• Diffuse-field equalisation filters relating to the order of Ambisonics to be used (a 2 channel
.wav file).
• Binaural Room Impulse Response JavaScript Object Notation (JSON) files which contains
144 two-channel BRIR data for a fixed source position, but multiple head rotations of the
KEMAR HATS (every 2.5 degrees).
Once loaded, the JSAmbisonics library uses the BRIRs, coupled with the order of Ambisonics to be
used, to create the filters needed for the binaural decoding of the audio (which will be ambisonically
encoded by the library). This results in two 31-channel filters (if 15th order is the maximum order to
be used); one 31-channel set for the left ear and one 31-channel set for the right ear. The functions
that needed expanding to allow for asymmetrical filter processing, necessary for correct reproduction
of the room response with head tracking, are those associated with the calculation of the filters and
the decoding of the Ambisonic audio, binaurally, which occur inthe binDecoder class. In the forked
version of the JSAmbisonics used for this project, this has resulted in two extra classes:
• HRIRloader2Dasym2 – a class that loads the BRIR data and calculates the filters needed for
binaural decoding of the Ambisonic audio without symmetry assumed.
• binDecoder2Dasym2 – a class that implements the Ambisonics to binaural decoding using
the filters calculated using HRIRloader2Dasym2 above and implementing the two parallel 31channel convolution streams needed.
Vol. 43. Pt. 2. 2021
Proceedings of the Institute of Acoustics
A block diagram showing the WHAM Webaudio API and JSAmbisonics elements as implemented on
the WHAM website are shown in Figure 3 below:
Audio
File
1
Diffuse
Field EQ
Filter
1
Mono
Encoder
2D
(15th order)
31
Order
Limiter
2D (N’)
2N’+1
2N’+1
Bin
Decoder
2D
R (asym)
L
2
Gain
Block
Webaudio API buffer is denoted by an arrow
Number below the array is the number of channels in that buffer (max 32)
N is the current order being auditioned by the user, up to 15 (31 channels)
Figure 3 - Block Diagram of the D-BRAT processing on the WHAM website
5
UPDATES TO THE WEBSITE FEATURES
The unique aspects of the website remain the same, with the use of a webcam local to a user’s
computer tracking head movement to generate playback of VHOA BRIRs with the correct source to
receiver orientation. This section will provide updates on changes made to the structure and features
of the website that have been implemented since the previous paper by the authors16, mainly
completing the list of intentions outlined in the future work so the platform is more engaging and
effective as an auralisation tool.
5.1
WEBCAM TRACKING
When first constructed the website had two main sections labelled “Face Tracking” and “Audio
Playback and Manipulation”. Site users were required to scroll down the page to activate playback of
audio and then scroll back up to observe the face tracking and respective positional data. As shown
in Figure 4, the audio playback controls, and sample selection have been moved alongside the
webcam tracking window. Users are now able to instantly engage with the key feature of the site
without complication which will encourage exploration of subsequent sections that make it a powerful
tool for immersive audio experiences.
Figure 4: Webcam tracking section of the WHAM website
Vol. 43. Pt. 2. 2021
Proceedings of the Institute of Acoustics
5.2
ORDERS AND FILTERS
The default set of BRIRs when the site loads are for MS015 (University of Derby) with a sound source
positioned at 45 degrees azimuth, binaurally rendered using 7th order circular harmonics. These
settings are equivalent to the ‘best’ the site presented when initially developed. As is evidenced in
Figure 5, due to the processes outline in section 0 of this paper, users are now able to select up to
and including 15th order for 3 acoustically different environments. For each environment, 4 different
loudspeaker positions have been captured at distances of 2m from the HATS, offset by ±45 degrees
to the front and rear. This provides users with some interesting audible interactions. It is also possible
to load locally stored filters for auditioning with the webcam tracking if available in a JSON.SOFA file
format.
A graphical expansion to the site was to provide a 360-degree visual of the associated spaces. Using
the Pannellum JavaScript plugin21, an equirectangular image of the two reverberant spaces can be
seen. The images switch according to the selected filter and the image follows the orientation of the
head. Images can be made full screen to get a clearer visual display of the space as it was at the
time. Providing the user with the correct visual scene during the binaural reproduction of the same
space is beneficial to ensure the phenomena of room divergence which has a negative effect on
externalisation12,22 is minimised and therefore the auralisation is ‘accurate’ to the space being
visualised by the listener.
Figure 5: Orders and Filters section of the WHAM website
5.3
LISTENING SURVEYS
Before the potential of this project was realised as an acoustic auralisation tool, it was developed to
continue research using remote methods whilst social interactions were limited due to Covid.
Implementing a listening survey section on the website was a key goal of this work, however until
orders significantly greater than 7th were realised any findings would not advance work previously
completed by the authors1.
Two well-known listening survey methodologies for perceptual audio studies were implemented (ABX
and MUSHRA), both with an operational limit of 15th order. Listeners can lower the maximum order
prior to starting the test; this was implemented as a precaution to the occurrence of audio playback
errors experienced on less powerful computers. It is accepted that not all results will be immediately
comparable, however it is hoped a significant number of participants will conduct the surveys to meet
statistically significant thresholds.
Vol. 43. Pt. 2. 2021
Proceedings of the Institute of Acoustics
5.3.1
ABX
This survey (Figure 6) aligns most closely to the work conducted by the authors looking into
perceptual thresholds of circular harmonic orders1. A single environment (MS015), single loudspeaker
position (45-degree azimuth) and single audio type (drums) is presented for randomised orders
against the maximum order determined by the user’s selection for a minimum of 10 questions.
Results obtained from this study will determine if listeners were able to either correctly or incorrectly
perceive a difference in the spatial attributes between the 15th order reproduction and the other order
presented for each question.
Figure 6: WHAM ABX Listening Survey
5.3.2
MUSHRA
The ABX survey will observe perceptual thresholds for the specific conditions presented. To gather
more insight into the spatial performance of the WHAM platform a MUSHRA survey (Figure 7) was
created to obtain more evaluative data. The specific spatial metric participants are asked to evaluate
is ‘localisability’, based on the definition given by Zacharov, Pederson and Pike23.
All 3 environments mentioned in 2.2 are used, with single loudspeaker position (45-degree azimuth)
and are randomly assigned per question. The reference uses filters at the maximum order determined
by the participant, with 4 conditions of randomised lower orders and 1 anchor condition at 0th order
(i.e., no spatialisation).
For a minimum of 10 questions each participant is asked to assign a grade between 0 and 100 to
each audio sample. A value of 100 indicates that in their judgement the sample exhibits a localisability
equal to the given reference, whereas a value of 0 would indicate the sample has no similarity to the
reference with respect to the localisability.
The outcomes from the MUSHRA survey will support analysis into the extent spatial localisation cues
are affected between orders for varying acoustic environments.
Vol. 43. Pt. 2. 2021
Proceedings of the Institute of Acoustics
Figure 7: WHAM MUSHRA Listening Survey
6
6.1
PRELIMINARY FINDINGS
Social Media Feedback
In May 2021 a post (Figure 8) to the Facebook group “Spatial Audio in VR/AR/MR” and also to the
Reddit sub-group “Spatial Audio” was made to gather website interest and seek listening survey
participants. The link was also shared in July 2021 to the Facebook groups “Reproduced Sound” and
“Headtracked Binaural”.
Figure 8: Post Made to Various Social Media Groups
The initial post in May received several comments including comments from established researchers
which opened some discussion on their experience when using the features of the WHAM website.
Interestingly, two individuals made very different comments regarding the use of the generic HRTFs
and how this affects the binaural rendering experienced, see Figure 9 and Figure 10.
Vol. 43. Pt. 2. 2021
Proceedings of the Institute of Acoustics
Figure 9: Facebook Comment Not Favouring the use of Generic HRTFs on WHAM website.
The feedback given in Figure 9 is an indicator that, for this listener, presenting asymmetric VHOA is
not enough in itself to elicit the benefits of rendering the sound field at higher orders (i.e., improved
localisation, externalisation). This comment mentions a 3rd order limit; however, assuming this listener
attempted the survey, the results shown in 6.2 indicate that incorrect responses only occur from 5th
order. Whilst the feedback mentions personalised HRTFs to overcome this problem, other system
limitations such as latency, computer processing and low light affecting face recognition may be in
effect. For all the benefits WHAM realises, system to system changes cannot be controlled.
Figure 10: Facebook Comment Favouring the use of Generic HRTFs on WHAM website.
Figure 10 by contrast encourages the direction taken in this study regarding non-personalised HRTFs
as this listener experienced accurate front/back differences, better than previous experiences using
similar HATS. Interestingly, this comment refers to the use of a high-powered mobile phone to engage
with the platform, highlighting how this work enables interactive auralisation wherever someone might
be.
6.2
Listening Surveys
Figure 11 and Figure 12 show the results from an early ABX listening survey where the site was
operating with a 10th order limit. Although a total of 19 individuals accessed the survey, 7 of these
attempted a single question, therefore these results have been discounted due to not meeting a
satisfactory participation threshold (i.e., minimum 10 questions). No data is available from the
MUSHRA survey at the time of publication.
It is evidenced in Figure 11 that 8 (~73%) participants had a minimum of 1 incorrect response in the
trials presented, with a maximum of 5 incorrect responses for any individual recorded within this data
set. From the results there is an indication that a perceptual difference exists between 10th order and
lower orders. This data alone does not support any meaningful conclusions although is an indicator
of sensitivity variations which supports the need to pursue a large number of participants.
Vol. 43. Pt. 2. 2021
Proceedings of the Institute of Acoustics
Figure 11: Individual ABX Reponses per Participant for all Orders Attempted
As the orders are randomised for each question it is possible that some participants will be presented
a greater number of lower orders. Table 1 breaks down the number of times each order was presented
across all participants. For all attempts 6th order shows the greatest number of occurrences (24), with
9th order the lowest (11). Percentages are also given in Table 1 and visually presented in Figure 12
for a clearer contrast of ‘success’, here it is shown that 8th order was perceived the same as 10th order
on more occasions (i.e., higher percentage incorrect) compared to other orders, including 9th.
Figure 12: Combined ABX Participant Responses (in %) for each Order vs. Max. Order of 10
Table 1: ABX Responses Separated per Order
Vol. 43. Pt. 2. 2021
Proceedings of the Institute of Acoustics
Observing that 4th, 7th and 8th order have identical number of attempts (17), a direct comparison of
the percentage correct values show only 4th order to be determined 100% perceptual different to the
10th order reference. Perceptual similarities are shown to occur from 5th order and above against the
10th order reference, with a greater number of incorrect responses, this trend isn’t consistent.
For statistical validation the p-value is presented in the last row of Table 1. Whilst none of the orders
exceed the 0.05 significance level determined for 50% forced choice response test method, the 8th
order attempts come very close. From these preliminary results we are unable to statistically confirm
that no difference between the 10th order reference and all lower orders were perceived (i.e., we
accept that a difference can be determined).
Ideally an infinite order would be used as the reference, however pushing the maximum order within
the limits of the system will hopefully reveal more about our spatial perception thresholds.
7
CONCLUSIONS
This paper has researched the current techniques being used to binaurally render reverberant sound
fields from Ambisonic signals. The techniques are either of limited order, render the spatial resolution
required for the direct and reverberant sound field separately or apply a left/right symmetry
simplification. Whilst a reduction in computational load is achieved, they are not robust to the physical
complexities of real rooms that impact on ideal diffusion or left/right variations from relative proximities
between sound source and reflective surfaces.
A clear argument has been put forward in this paper to binaurally present sound fields with asymmetry
using VHOA within the capabilities of the system. Application of this logic saw developments of the
JS Ambisonics framework which were then employed in the WHAM website to deliver horizontal only
BRIRs to a current maximum of asymmetric 15th order.
Although the WHAM website was initially created to enable head-tracking without the need for
dedicated head tracking peripherals, allowing continued research into perceptual order thresholds,
its development over the last year has opened the possibilities to this being a powerful auralisation
tool. Social media feedback has been encouraging to support this idea and, for some listeners,
excelled beyond their previous experiences.
Preliminary (10th order limit) ABX listening survey data taken from 11 responses whilst not statistically
valid does indicate the platform reveals perceptual differences especially at low orders and therefore
reinforces the need for continued research to compare against much higher orders.
7.1
Further Work
The site will continue to be updated with acoustic environments captured by the authors. It is hoped
that contributions from interested parties will come forward to make the library of environments both
diverse and interesting.
Further exploration of latency of the webcam tracking compared to physical devices is needed as this
is cited as having a potential impact on source stability24. Latency variations between different
systems should also be investigated to help determine a “minimum system requirements” to render
the complex reverberant fields at a sufficiently high order.
8
RESOURCES
•
•
•
WHAM website: https://brucewiggins.co.uk/WHAM/
Automated
Sweep
and
Turntable
Rotation
Python
ReaScript:
https://bitbucket.org/DrWig/wigware-reaper-scripts/src/master/WigET250-3D_Turntable.py
Forked JSAmbisonics supporting 15th Order 2D Ambisonics with Asymmetrical Filters:
https://github.com/DrWig/JSAmbisonics
Vol. 43. Pt. 2. 2021
Proceedings of the Institute of Acoustics
9
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
REFERENCES
M. Dring, B. Wiggins, (2019) “The Transparency of Binaural Auralisation Using Very High Order
Circular Harmonics”, Reproduced Sound 2019 - Institute of Acoustics, Bristol, UK, Vol. 41. Pt. 3 2019,
p. 165-173
Abhayapala, T.D., Ward, D.B., (2002) “Theory and design of high order sound field microphones using
spherical microphone array” IEEE International Conference on Acoustics Speech and Signal
Processing
Rode Microphones. (2021), The Beginner’s Guide to Ambisonics. [online] Available at:
<https://www.rode.com/blog/all/what-is-ambisonics> [Accessed 5th October 2021]
Malham, D., ‘Spatial Hearing Mechanisms and Sound Reproduction’, available at:
[https://www.digitalbrainstorming.ch/db_data/eve/ambisonics/text02.pdf], (1998).
Wiggins, B. (2017) Analysis of Binaural Cue Matching using Ambisonics to Binaural Decoding
Techniques. 4th International Conference on Spatial Audio, 7-10 Sept., Graz, Austria
Politis, A., Poirer-Quinot, D. JSAmbisonics: A Web Audio library for interactive spatial sound
processing on the web. Interactive Audio Systems Symposium, York, UK, 2016
Zylia Sp. z o.o. (2021), ZYLIA ZM-1 microphone. [online] Available at <https://www.zylia.co/zylia-zm1-microphone.html> [Accessed 5th October 2021]
mh
acoustics
LLC.
(2021),
Eigenmike
Microphone.
[online]
Available
at:
<https://mhacoustics.com/products#eigenmike1> [Accessed 5th October 2021]
Avni, A. et al. “Spatial perception of sound fields recorded by spherical microphone arrays with varying
spatial resolution”, The Journal of the Acoustical Society of America 133, 2711 (2013)
Wiggins, B. (2004) An Investigation into the Real-time Manipulation and Control of Three-dimensional
Sound Fields. PhD thesis, University of Derby, Derby, UK.
Hendrickx, E.; Stitt, P., Influence of head tracking on the externalization of speech stimuli for nonindividualized binaural synthesis. J. Acoustic. Soc. Am. 2017, Vol. 141[3], pp. 2011-2023, March.
S. Werner, G. Gotz, F. Klein, Influence of Head Tracking on the Externalization of Auditory Events at
Divergence between Synthesized and Listening Room Using a Binaural Headphone System, AES
143rd Convention, New York, New York, USA, October 18-21, 2017.
Engel, I., Henry, C., et al. (2021) “Perceptual implications of different Ambisonics-based methods for
binaural reverberation”, J. Acoustic. Soc. Am. 2021, Vol. 149[2], pp. 895-910, February.
Lindau, A., Kosanke, L., Weinzierl, S., “Perceptual Evaluation of Model- and Signal-Based Predictors
of the Mixing Time in Binaural Room Impulse Responses” J. Audio Eng. Soc. 60(11). 887-898.
(November 2012).
GRAS Sound & Vibration (2021), Head & Torso Simulators. [online] Available at
<https://www.grasacoustics.com/products/head-torso-simulators-kemar> [Accessed 18th October
2021]
Dring, M. Wiggins, B. (2020), “WHAM: Webcam Head-Tracked Ambisonics”, Reproduced Sound 2020
- Institute of Acoustics, Online, Vol. 42. Pt. 3 2020.
Wiggins, B. Paterson-Stephens, I., Schillebeeckx, P. (2001), “The analysis of multi-channel sound
reproduction algorithms using HRTF data.” 19th International AES Surround Sound Convention,
Germany, p. 111-123.
Wightman, F. L., and Kistler, D. J. (1989b), "Headphone simulation of free-field listening II:
Psychophysical validation," J. Acoust. Soc. Am. 85, 868-878 (February 1989).
Zahorik, P. (2000), “Distance localization using non-individualized head‐related transfer functions.”
The Journal of the Acoustical Society of America 108, 2597
Brookes, T. Treble, C. (2005), “The effect of non-symmetrical left/right recording pinnae on the
perceived externalisation of binaural recordings.” 118th AES Convention, Barcelona, Spain.
Petroff, M. (2021), Pannellum. [online] Available at: <https://pannellum.org/> [Accessed 20th October
2021]
Werner, S., Klein, F., et al. (2016), “A Summary on Acoustic Room Divergence and its Effect on
Externalization of Auditory Events” 8th International Conference on Quality of Multimedia Experience,
QoMEX 2016, 23 June 2016.
Zacharov, N., Pederson, T.H., Pike, C., (2016), “A common lexicon for spatial sound quality
assessment – latest developments”, 8th International Conference on Quality of Multimedia Experience,
QoMEX 2016, June 2016.
Stitt, P., Hendrickx, E., The Influence of Head Tracking Latency on Binaural Rendering in Simple and
Complex Sound Scenes. AES 140th Convention, Paris, France, June 4-7, 2016.
Vol. 43. Pt. 2. 2021