Wham: To Asymmetry and Beyond!

Bruce Wiggins

Wham: To Asymmetry and Beyond!

2021

Proceedings of the Institute of Acoustics WHAM: TO ASYMMETRY AND BEYOND! M Dring Dr B Wiggins 1 College of Science and Engineering, University of Derby, Derby, UK College of Science and Engineering, University of Derby, Derby, UK INTRODUCTION Spatial hearing is our ability to make sense of the complex and multiple acoustic cues provided to our ear/brain system by the world around us. Sounding objects have a direction, location and sound of the place where they occur and, combined with the dynamic nature of our constantly changing position, give a rich source of data for the world around us. The reproduction of sound can benefit greatly, in many situations, from an accurate dynamic simulation of this complex acoustical scenario, or auralisation. Auralisation of acoustic spaces is a tool used in many industries. To provide a truly representative result, the systems used must capture and deliver critical, dynamic, psychoacoustic cues that react to the listener’s head position. The WHAM (Webcam Head-tracked Ambisonics) website (www.brucewiggins.co.uk/WHAM) utilises webcams to provide auralisation that reacts to head rotation via the browser using standard HRTF data; visitors to the site can experience very high order horizontal only Ambisonic to binaural presentation of room responses. In its initial inception, orders were limited to 7th order asymmetry for the final binaural presentation, which previous research has shown to fall below a transparent perceptual threshold compared to a system limited to 31st order1. This paper documents the developments to deliver beyond 7th order and improvements in functionality made to the WHAM website and open-source JS Ambisonics software library, that continue to make it a useful remote resource for acoustic auralisation purposes. 2 BINAURAL TECHNOLOGIES In this study we are involved with the binaural presentation of the sound field seeking to maintain the inter-aural cues that convince the listener that sound sources are positioned at a known point within a specific environment. The technology and processes used to capture and recreate the sound field will influence the subjective response from the listener, especially if they become more aware of the ‘system’ than the audio experience. 2.1 AMBISONIC MICROPHONES A 360 degree (periphonic) acoustic signature of a room or environment can be achieved using specialist Ambisonic microphones, capable of capturing the spherical harmonic composition of the sound field2. Invented by Gerzon and Craven3, in 1978 the first SoundField microphone was launched, constructed using spaced cardioid microphone capsules to receive 4 signals (known as A-Format). These input signals were transformed using dedicated hardware into 4 channels of B-Format also known as First Order Ambisonics (FOA) to represent the complete sound field with a minimum of 8 speakers when requiring the presentation of height4. Upon reproduction, unlike some other audio conventions, Ambisonics can be decoded to different speaker layouts and be manipulated to provide rotation, tumble and tilt of the 3D sound field. Furthermore, Ambisonics can be efficiently converted into a binaural signal, suitable for headphone listening, through convolution with Head Related Transfer Functions (HRTFs)5,6. Ambisonics being a hierarchical format means that both the number of inputs for signal capture and the number of outputs for reproduction can be expanded. Higher Order Ambisonic (HOA) microphones are now commercially available, examples include the Zylia ZM-17 (3rd order) and the MH-Acoustics Eigenmike8 (4th order). An increase in order and therefore the subsequent order of spherical harmonics able to be captured is associated with an improvement in spatial resolution and the effects of spatial aliasing. The perceptual impact of a reduced spatial resolution will be a smearing of the sharp peak of the plane Vol. 43. Pt. 2. 2021 Proceedings of the Institute of Acoustics wave in a wide range of directions9. Spatial aliasing will limit the frequency bandwidth needed to correctly reproduce the interaural cues used to determine direction of arrival but also the environment the sounds have occurred in5,10. Whilst current HOA microphones make for a quick and convenient method to capture the sound field, the described limitations make this method undesirable for the intentions of this project. 2.2 HEAD AND TORSO SIMULATORS To extend beyond the currently limited Ambisonic microphone orders the sound field or binaural room impulse response (BRIR) can be captured using a Dummy Head or Head and Torso Simulator (HATS). BRIRs can be captured by a HATS in two ways, the first, is keeping the head static and measure multiple source positions around the head. The second, is maintaining a static source position and rotate the HATS. In anechoic situations rotating the head or rotating the source around the head result in the same filters. In echoic situations, though, they are not the same. Using either of these methods means the head related transfer functions (HRTFs) associated with the dummy head cannot be separated from the room response and the reported benefits of HRTF personalisation cannot be applied at a later stage in the binaural reproduction process. However, with head-tracking devices, rotation, tumble and tilt can be implemented which research shows negates the cone of confusion issues and supports externalisation11,12. To achieve head-tracking in the reproduction stages either the dummy head is rotated for a static sound source, or the sound source is moved around the dummy head. If a greater number of BRIRs are generated, indicative of more discrete measurement points, a higher order of spherical harmonics can be leveraged to represent the sound field. As an example, for spherical reproduction at orders up to and including 35th, then 1296 ambisonic/spherical harmonic channels would be required according to equation (1), with binaural presentation doubling this value for a left and right pair. 𝐻𝐻𝐻𝐻𝐻𝐻 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = (𝐻𝐻𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 + 1)2 (1) HOA presented binaurally can be computationally intensive due to the number of convolutions needed which can lead to slow render times and audio presentation errors such as glitches or distortion if sufficient computing power isn’t available. Various researchers seek techniques that reduce the order yet maintain the desired spatial attributes without generating perceptual errors. A commonly considered technique is to observe the resolution required for the direct sound differently from that required for the reverberant field, adopting a “hybrid” approach13. Whilst these techniques capitalise on the studied perceptual thresholds of the reverberant/diffuse field through separation at a determined ‘mix time’, there is admission that in real rooms, ideal diffusion (and therefore a perfect mixing time) will never occur due to known occurrences of modal behaviour, proximity to room boundaries and non-uniform distribution of absorption14. The Reverberant Virtual Loudspeaker (RVL) technique documented by Engel and Henry13 offers a solution to minimise the number of required convolutions for head-tracked BRIR presentation. However, the room and room reflections are “head-locked” and don’t track with head rotations which leads to misaligned auditory cues between the direct and reverberant sound fields. Room positions which exhibit significant diffusion may mask this misalignment and Engel and Henry conclude that RVL, which renders the reverberation at low orders (max 4th order), is subjectively comparable to ‘less flexible Ambisonic approaches’. However, this study positions the head in the centre of the room where generally sufficient diffusion and room symmetry is likely due to maximum path differences from all boundaries. However, if the aim is to generate auralisations that aren’t compromised in any detail for any listening position in the room and provide dynamic rendering that ensures early and late reflections are truly relative to the source position and position of the head on playback, then Very High Ambisonic Orders (VHOA) should be implemented. As an audio format analogy, it is the equivalent of rendering a WAV file compared to an MP3, the complexities of perceptual thresholds for each individual recording cannot be determined with absolute certainty, therefore we should avoid making any comprise. Vol. 43. Pt. 2. 2021 Proceedings of the Institute of Acoustics For this project the KEMAR15 head and torso simulator was employed to capture one hemi-anechoic (Hemi-anechoic chamber, University of Derby) environment and two reverberant environments (MS015, University of Derby and St. Paul’s Church, Derby) in the horizontal plane only. The HATS was rotated on a turntable every 2.5 degrees for a single loudspeaker position (144 BRIRs), which leads to a possible 71st order of circular harmonics for binaural rendering. The automated capturing process known as Dynamic Binaural Reverberation Acquisition Technique (D-BRAT), previously referred to as “Binaural Scanning”, has been documented by these authors in a previous paper16. Whilst limited to a fixed position of a single sound source and generic HRTFs this technique ensures listeners will experience a sound field that accurately reflects the dynamic cues experienced in real life. 3 THE IMPORTANCE OF ASYMMETRY A typical criticism of binaural rendering is having sound sources located ”in the head” or poorly localised (mainly front/back confusions), which is attributed to non-individualised HRTFs18; a known limitation of D-BRAT. It is therefore important that other processes, namely natural reverberation19, dynamic cues, and asymmetrical pinnae20 are capitalised upon that will maximise the auralisation experience; put simply we need to ensure a decorrelation of signals arriving at the left and right ear occur. Another technique adopted to increase computational efficiency of binaurally reproduced signals is to assume HRTF (and by extension BRIR) left and right symmetry5. When this assumption is made the number of convolutions per spherical harmonic is reduced, as front/back components are the same, and left/right components are simply phase inverted17. However, any assumption of symmetry will erode the desired decorrelation and is unrealistic for most real rooms considering all the impacts on ideal diffusion mentioned previously. It is therefore of importance as will be discussed later in the paper that signals fed to the left and right ear are presented identically to how they have been captured. BRIRs will rarely, if ever, be left/right symmetric. In addition to the asymmetrical convolution processes at the end of the signal chain, research shows the utilisation of asymmetrical pinnae will have a positive effect on ‘externalisation’20. D-BRAT makes use of the KEMAR HATS, this is known to use pinnae that are not symmetrical as they are based on real measurements of human dimensions. Figure 1 and Figure 2 show the pinnae used for this project and provide clear evidence of the differences that will lead to decorrelation of not only the reverberant field but also the direct sound source. Figure 1: Front profile of the KEMAR pinnae Vol. 43. Pt. 2. 2021 Figure 2: Side profile of the KEMAR pinnae Proceedings of the Institute of Acoustics 4 JS AMBISONICS: DEVELOPING 15TH ORDER ASYMMETRY JSAmbisonics, created by Politis and Poirer-Quinot, is a “JavaScript library that implements a set of objects for real-time spatial audio processing, using the Ambisonics framework. The objects correspond to typical ambisonic processing blocks, and internally implement Web Audio graphs for the associated operations.” It’s maturity, features and the fact that it is open source, made it an ideal candidate for use in the WHAM project (the other contender being Omnitone from Google https://googlechrome.github.io/omnitone/). JSAmbisonics is designed to work using the WebAudio API, a specification developed by W3C describing a high-level JavaScript API for processing and synthesising audio on the web, which is supported by major browsers on computers and mobile devices allowing for a low barrier to entry for users. The JSAmbisonics library already included much of the functionality and classes needed for the project with the main classes of interest being: • monoEncoder2D: used to encode a sound source into an ambisonic stream of a set order, with real-time control of the panning direction. • binDecoder2D: implements an ambisonic to binaural decode, using user-defined HRTFbased filters up to 15th order (horizontal only and assuming left/right HRTF/head symmetry) • orderLimiter2D: takes a Higher Order Ambisonic stream of order N and outputs the channellimited HOA stream of order N’<=N (up to 15th order, 2D only) Currently, the Web Audio API dictates that browsers will support ‘at least 32 channels’. This has meant that browsers support 32 channels, and no more, and is where the 15th order limitation for horizontal decoding comes from (15th order needing 31 Ambisonic channels and the highest order that can be implemented below 32 channels). As mentioned above in section 3 all binaurally decoded implementations of Ambisonics assume that the head and ear are left/right symmetric. This reduces the number of convolutions needed to 1 per spherical harmonic. However, for this project, where BRIRs are captured and utilised, left/right symmetry cannot be assumed as the room response will not exhibit this feature and, as discussed above, asymmetry has been shown to be of importance even for anechoic HRTF data. If the HRIR/BRIR data isn’t left/right symmetric, then the resultant spherical harmonic filters will be different for the left and right ear streams. For this reason, a number of JS Ambisonics’ functions needed to be updated to allow for the processing of two 31-channel convolution streams simultaneously. When the website is accessed, a number of audio files and filters are pre-loaded for use: • An audio file used in the auralisation (a mono .wav file) • Diffuse-field equalisation filters relating to the order of Ambisonics to be used (a 2 channel .wav file). • Binaural Room Impulse Response JavaScript Object Notation (JSON) files which contains 144 two-channel BRIR data for a fixed source position, but multiple head rotations of the KEMAR HATS (every 2.5 degrees). Once loaded, the JSAmbisonics library uses the BRIRs, coupled with the order of Ambisonics to be used, to create the filters needed for the binaural decoding of the audio (which will be ambisonically encoded by the library). This results in two 31-channel filters (if 15th order is the maximum order to be used); one 31-channel set for the left ear and one 31-channel set for the right ear. The functions that needed expanding to allow for asymmetrical filter processing, necessary for correct reproduction of the room response with head tracking, are those associated with the calculation of the filters and the decoding of the Ambisonic audio, binaurally, which occur inthe binDecoder class. In the forked version of the JSAmbisonics used for this project, this has resulted in two extra classes: • HRIRloader2Dasym2 – a class that loads the BRIR data and calculates the filters needed for binaural decoding of the Ambisonic audio without symmetry assumed. • binDecoder2Dasym2 – a class that implements the Ambisonics to binaural decoding using the filters calculated using HRIRloader2Dasym2 above and implementing the two parallel 31channel convolution streams needed. Vol. 43. Pt. 2. 2021 Proceedings of the Institute of Acoustics A block diagram showing the WHAM Webaudio API and JSAmbisonics elements as implemented on the WHAM website are shown in Figure 3 below: Audio File 1 Diffuse Field EQ Filter 1 Mono Encoder 2D (15th order) 31 Order Limiter 2D (N’) 2N’+1 2N’+1 Bin Decoder 2D R (asym) L 2 Gain Block Webaudio API buffer is denoted by an arrow Number below the array is the number of channels in that buffer (max 32) N is the current order being auditioned by the user, up to 15 (31 channels) Figure 3 - Block Diagram of the D-BRAT processing on the WHAM website 5 UPDATES TO THE WEBSITE FEATURES The unique aspects of the website remain the same, with the use of a webcam local to a user’s computer tracking head movement to generate playback of VHOA BRIRs with the correct source to receiver orientation. This section will provide updates on changes made to the structure and features of the website that have been implemented since the previous paper by the authors16, mainly completing the list of intentions outlined in the future work so the platform is more engaging and effective as an auralisation tool. 5.1 WEBCAM TRACKING When first constructed the website had two main sections labelled “Face Tracking” and “Audio Playback and Manipulation”. Site users were required to scroll down the page to activate playback of audio and then scroll back up to observe the face tracking and respective positional data. As shown in Figure 4, the audio playback controls, and sample selection have been moved alongside the webcam tracking window. Users are now able to instantly engage with the key feature of the site without complication which will encourage exploration of subsequent sections that make it a powerful tool for immersive audio experiences. Figure 4: Webcam tracking section of the WHAM website Vol. 43. Pt. 2. 2021 Proceedings of the Institute of Acoustics 5.2 ORDERS AND FILTERS The default set of BRIRs when the site loads are for MS015 (University of Derby) with a sound source positioned at 45 degrees azimuth, binaurally rendered using 7th order circular harmonics. These settings are equivalent to the ‘best’ the site presented when initially developed. As is evidenced in Figure 5, due to the processes outline in section 0 of this paper, users are now able to select up to and including 15th order for 3 acoustically different environments. For each environment, 4 different loudspeaker positions have been captured at distances of 2m from the HATS, offset by ±45 degrees to the front and rear. This provides users with some interesting audible interactions. It is also possible to load locally stored filters for auditioning with the webcam tracking if available in a JSON.SOFA file format. A graphical expansion to the site was to provide a 360-degree visual of the associated spaces. Using the Pannellum JavaScript plugin21, an equirectangular image of the two reverberant spaces can be seen. The images switch according to the selected filter and the image follows the orientation of the head. Images can be made full screen to get a clearer visual display of the space as it was at the time. Providing the user with the correct visual scene during the binaural reproduction of the same space is beneficial to ensure the phenomena of room divergence which has a negative effect on externalisation12,22 is minimised and therefore the auralisation is ‘accurate’ to the space being visualised by the listener. Figure 5: Orders and Filters section of the WHAM website 5.3 LISTENING SURVEYS Before the potential of this project was realised as an acoustic auralisation tool, it was developed to continue research using remote methods whilst social interactions were limited due to Covid. Implementing a listening survey section on the website was a key goal of this work, however until orders significantly greater than 7th were realised any findings would not advance work previously completed by the authors1. Two well-known listening survey methodologies for perceptual audio studies were implemented (ABX and MUSHRA), both with an operational limit of 15th order. Listeners can lower the maximum order prior to starting the test; this was implemented as a precaution to the occurrence of audio playback errors experienced on less powerful computers. It is accepted that not all results will be immediately comparable, however it is hoped a significant number of participants will conduct the surveys to meet statistically significant thresholds. Vol. 43. Pt. 2. 2021 Proceedings of the Institute of Acoustics 5.3.1 ABX This survey (Figure 6) aligns most closely to the work conducted by the authors looking into perceptual thresholds of circular harmonic orders1. A single environment (MS015), single loudspeaker position (45-degree azimuth) and single audio type (drums) is presented for randomised orders against the maximum order determined by the user’s selection for a minimum of 10 questions. Results obtained from this study will determine if listeners were able to either correctly or incorrectly perceive a difference in the spatial attributes between the 15th order reproduction and the other order presented for each question. Figure 6: WHAM ABX Listening Survey 5.3.2 MUSHRA The ABX survey will observe perceptual thresholds for the specific conditions presented. To gather more insight into the spatial performance of the WHAM platform a MUSHRA survey (Figure 7) was created to obtain more evaluative data. The specific spatial metric participants are asked to evaluate is ‘localisability’, based on the definition given by Zacharov, Pederson and Pike23. All 3 environments mentioned in 2.2 are used, with single loudspeaker position (45-degree azimuth) and are randomly assigned per question. The reference uses filters at the maximum order determined by the participant, with 4 conditions of randomised lower orders and 1 anchor condition at 0th order (i.e., no spatialisation). For a minimum of 10 questions each participant is asked to assign a grade between 0 and 100 to each audio sample. A value of 100 indicates that in their judgement the sample exhibits a localisability equal to the given reference, whereas a value of 0 would indicate the sample has no similarity to the reference with respect to the localisability. The outcomes from the MUSHRA survey will support analysis into the extent spatial localisation cues are affected between orders for varying acoustic environments. Vol. 43. Pt. 2. 2021 Proceedings of the Institute of Acoustics Figure 7: WHAM MUSHRA Listening Survey 6 6.1 PRELIMINARY FINDINGS Social Media Feedback In May 2021 a post (Figure 8) to the Facebook group “Spatial Audio in VR/AR/MR” and also to the Reddit sub-group “Spatial Audio” was made to gather website interest and seek listening survey participants. The link was also shared in July 2021 to the Facebook groups “Reproduced Sound” and “Headtracked Binaural”. Figure 8: Post Made to Various Social Media Groups The initial post in May received several comments including comments from established researchers which opened some discussion on their experience when using the features of the WHAM website. Interestingly, two individuals made very different comments regarding the use of the generic HRTFs and how this affects the binaural rendering experienced, see Figure 9 and Figure 10. Vol. 43. Pt. 2. 2021 Proceedings of the Institute of Acoustics Figure 9: Facebook Comment Not Favouring the use of Generic HRTFs on WHAM website. The feedback given in Figure 9 is an indicator that, for this listener, presenting asymmetric VHOA is not enough in itself to elicit the benefits of rendering the sound field at higher orders (i.e., improved localisation, externalisation). This comment mentions a 3rd order limit; however, assuming this listener attempted the survey, the results shown in 6.2 indicate that incorrect responses only occur from 5th order. Whilst the feedback mentions personalised HRTFs to overcome this problem, other system limitations such as latency, computer processing and low light affecting face recognition may be in effect. For all the benefits WHAM realises, system to system changes cannot be controlled. Figure 10: Facebook Comment Favouring the use of Generic HRTFs on WHAM website. Figure 10 by contrast encourages the direction taken in this study regarding non-personalised HRTFs as this listener experienced accurate front/back differences, better than previous experiences using similar HATS. Interestingly, this comment refers to the use of a high-powered mobile phone to engage with the platform, highlighting how this work enables interactive auralisation wherever someone might be. 6.2 Listening Surveys Figure 11 and Figure 12 show the results from an early ABX listening survey where the site was operating with a 10th order limit. Although a total of 19 individuals accessed the survey, 7 of these attempted a single question, therefore these results have been discounted due to not meeting a satisfactory participation threshold (i.e., minimum 10 questions). No data is available from the MUSHRA survey at the time of publication. It is evidenced in Figure 11 that 8 (~73%) participants had a minimum of 1 incorrect response in the trials presented, with a maximum of 5 incorrect responses for any individual recorded within this data set. From the results there is an indication that a perceptual difference exists between 10th order and lower orders. This data alone does not support any meaningful conclusions although is an indicator of sensitivity variations which supports the need to pursue a large number of participants. Vol. 43. Pt. 2. 2021 Proceedings of the Institute of Acoustics Figure 11: Individual ABX Reponses per Participant for all Orders Attempted As the orders are randomised for each question it is possible that some participants will be presented a greater number of lower orders. Table 1 breaks down the number of times each order was presented across all participants. For all attempts 6th order shows the greatest number of occurrences (24), with 9th order the lowest (11). Percentages are also given in Table 1 and visually presented in Figure 12 for a clearer contrast of ‘success’, here it is shown that 8th order was perceived the same as 10th order on more occasions (i.e., higher percentage incorrect) compared to other orders, including 9th. Figure 12: Combined ABX Participant Responses (in %) for each Order vs. Max. Order of 10 Table 1: ABX Responses Separated per Order Vol. 43. Pt. 2. 2021 Proceedings of the Institute of Acoustics Observing that 4th, 7th and 8th order have identical number of attempts (17), a direct comparison of the percentage correct values show only 4th order to be determined 100% perceptual different to the 10th order reference. Perceptual similarities are shown to occur from 5th order and above against the 10th order reference, with a greater number of incorrect responses, this trend isn’t consistent. For statistical validation the p-value is presented in the last row of Table 1. Whilst none of the orders exceed the 0.05 significance level determined for 50% forced choice response test method, the 8th order attempts come very close. From these preliminary results we are unable to statistically confirm that no difference between the 10th order reference and all lower orders were perceived (i.e., we accept that a difference can be determined). Ideally an infinite order would be used as the reference, however pushing the maximum order within the limits of the system will hopefully reveal more about our spatial perception thresholds. 7 CONCLUSIONS This paper has researched the current techniques being used to binaurally render reverberant sound fields from Ambisonic signals. The techniques are either of limited order, render the spatial resolution required for the direct and reverberant sound field separately or apply a left/right symmetry simplification. Whilst a reduction in computational load is achieved, they are not robust to the physical complexities of real rooms that impact on ideal diffusion or left/right variations from relative proximities between sound source and reflective surfaces. A clear argument has been put forward in this paper to binaurally present sound fields with asymmetry using VHOA within the capabilities of the system. Application of this logic saw developments of the JS Ambisonics framework which were then employed in the WHAM website to deliver horizontal only BRIRs to a current maximum of asymmetric 15th order. Although the WHAM website was initially created to enable head-tracking without the need for dedicated head tracking peripherals, allowing continued research into perceptual order thresholds, its development over the last year has opened the possibilities to this being a powerful auralisation tool. Social media feedback has been encouraging to support this idea and, for some listeners, excelled beyond their previous experiences. Preliminary (10th order limit) ABX listening survey data taken from 11 responses whilst not statistically valid does indicate the platform reveals perceptual differences especially at low orders and therefore reinforces the need for continued research to compare against much higher orders. 7.1 Further Work The site will continue to be updated with acoustic environments captured by the authors. It is hoped that contributions from interested parties will come forward to make the library of environments both diverse and interesting. Further exploration of latency of the webcam tracking compared to physical devices is needed as this is cited as having a potential impact on source stability24. Latency variations between different systems should also be investigated to help determine a “minimum system requirements” to render the complex reverberant fields at a sufficiently high order. 8 RESOURCES • • • WHAM website: https://brucewiggins.co.uk/WHAM/ Automated Sweep and Turntable Rotation Python ReaScript: https://bitbucket.org/DrWig/wigware-reaper-scripts/src/master/WigET250-3D_Turntable.py Forked JSAmbisonics supporting 15th Order 2D Ambisonics with Asymmetrical Filters: https://github.com/DrWig/JSAmbisonics Vol. 43. Pt. 2. 2021 Proceedings of the Institute of Acoustics 9 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. REFERENCES M. Dring, B. Wiggins, (2019) “The Transparency of Binaural Auralisation Using Very High Order Circular Harmonics”, Reproduced Sound 2019 - Institute of Acoustics, Bristol, UK, Vol. 41. Pt. 3 2019, p. 165-173 Abhayapala, T.D., Ward, D.B., (2002) “Theory and design of high order sound field microphones using spherical microphone array” IEEE International Conference on Acoustics Speech and Signal Processing Rode Microphones. (2021), The Beginner’s Guide to Ambisonics. [online] Available at: <https://www.rode.com/blog/all/what-is-ambisonics> [Accessed 5th October 2021] Malham, D., ‘Spatial Hearing Mechanisms and Sound Reproduction’, available at: [https://www.digitalbrainstorming.ch/db_data/eve/ambisonics/text02.pdf], (1998). Wiggins, B. (2017) Analysis of Binaural Cue Matching using Ambisonics to Binaural Decoding Techniques. 4th International Conference on Spatial Audio, 7-10 Sept., Graz, Austria Politis, A., Poirer-Quinot, D. JSAmbisonics: A Web Audio library for interactive spatial sound processing on the web. Interactive Audio Systems Symposium, York, UK, 2016 Zylia Sp. z o.o. (2021), ZYLIA ZM-1 microphone. [online] Available at <https://www.zylia.co/zylia-zm1-microphone.html> [Accessed 5th October 2021] mh acoustics LLC. (2021), Eigenmike Microphone. [online] Available at: <https://mhacoustics.com/products#eigenmike1> [Accessed 5th October 2021] Avni, A. et al. “Spatial perception of sound fields recorded by spherical microphone arrays with varying spatial resolution”, The Journal of the Acoustical Society of America 133, 2711 (2013) Wiggins, B. (2004) An Investigation into the Real-time Manipulation and Control of Three-dimensional Sound Fields. PhD thesis, University of Derby, Derby, UK. Hendrickx, E.; Stitt, P., Influence of head tracking on the externalization of speech stimuli for nonindividualized binaural synthesis. J. Acoustic. Soc. Am. 2017, Vol. 141[3], pp. 2011-2023, March. S. Werner, G. Gotz, F. Klein, Influence of Head Tracking on the Externalization of Auditory Events at Divergence between Synthesized and Listening Room Using a Binaural Headphone System, AES 143rd Convention, New York, New York, USA, October 18-21, 2017. Engel, I., Henry, C., et al. (2021) “Perceptual implications of different Ambisonics-based methods for binaural reverberation”, J. Acoustic. Soc. Am. 2021, Vol. 149[2], pp. 895-910, February. Lindau, A., Kosanke, L., Weinzierl, S., “Perceptual Evaluation of Model- and Signal-Based Predictors of the Mixing Time in Binaural Room Impulse Responses” J. Audio Eng. Soc. 60(11). 887-898. (November 2012). GRAS Sound & Vibration (2021), Head & Torso Simulators. [online] Available at <https://www.grasacoustics.com/products/head-torso-simulators-kemar> [Accessed 18th October 2021] Dring, M. Wiggins, B. (2020), “WHAM: Webcam Head-Tracked Ambisonics”, Reproduced Sound 2020 - Institute of Acoustics, Online, Vol. 42. Pt. 3 2020. Wiggins, B. Paterson-Stephens, I., Schillebeeckx, P. (2001), “The analysis of multi-channel sound reproduction algorithms using HRTF data.” 19th International AES Surround Sound Convention, Germany, p. 111-123. Wightman, F. L., and Kistler, D. J. (1989b), "Headphone simulation of free-field listening II: Psychophysical validation," J. Acoust. Soc. Am. 85, 868-878 (February 1989). Zahorik, P. (2000), “Distance localization using non-individualized head‐related transfer functions.” The Journal of the Acoustical Society of America 108, 2597 Brookes, T. Treble, C. (2005), “The effect of non-symmetrical left/right recording pinnae on the perceived externalisation of binaural recordings.” 118th AES Convention, Barcelona, Spain. Petroff, M. (2021), Pannellum. [online] Available at: <https://pannellum.org/> [Accessed 20th October 2021] Werner, S., Klein, F., et al. (2016), “A Summary on Acoustic Room Divergence and its Effect on Externalization of Auditory Events” 8th International Conference on Quality of Multimedia Experience, QoMEX 2016, 23 June 2016. Zacharov, N., Pederson, T.H., Pike, C., (2016), “A common lexicon for spatial sound quality assessment – latest developments”, 8th International Conference on Quality of Multimedia Experience, QoMEX 2016, June 2016. Stitt, P., Hendrickx, E., The Influence of Head Tracking Latency on Binaural Rendering in Simple and Complex Sound Scenes. AES 140th Convention, Paris, France, June 4-7, 2016. Vol. 43. Pt. 2. 2021

Log In

Wham: To Asymmetry and Beyond!

Related papers

Related papers

Related topics