Claudia Jenny, Piotr Majdak, Christoph Reuter: Richtungshören bei statischen und bewegten Schallquellen. "Musik und Bewegung" - 33. Jahrestagung 2016 der Deutschen Gesellschaft für Musikpsychologie (DGM), Universität Hamburg, 15.-17.... more
Claudia Jenny, Piotr Majdak, Christoph Reuter: Richtungshören bei statischen und bewegten Schallquellen. "Musik und Bewegung" - 33. Jahrestagung 2016 der Deutschen Gesellschaft für Musikpsychologie (DGM), Universität Hamburg, 15.-17. September 2017.
Described in this paper is a method for the analysis and comparison of multi-speaker surround sound algorithms using HRTF data. Using Matlab and Simulink [1] a number of surround sound systems were modeled, both over multiple speakers... more
Described in this paper is a method for the analysis and comparison of multi-speaker surround sound algorithms using HRTF data. Using Matlab and Simulink [1] a number of surround sound systems were modeled, both over multiple speakers (for listening tests) and using the MIT Media Labs HRTF set (for analysis)[2]. The systems under test were 1 Order Ambisonics over eight and five speakers, 2 Order Ambisonics over eight speakers and Amplitude panned 5.0 over five speakers. The listening test results were then compared to the HRTF analysis with favourable results.
El sistema auditivo es un sofisticado procesador espacial que le permite al individuo detectar y posicionar las fuentes sonoras y además suministra información para distinguirla. La totalidad de los distintos entornos sonoros poseen... more
El sistema auditivo es un sofisticado procesador espacial que le permite al individuo detectar y posicionar las fuentes sonoras y además suministra información para distinguirla. La totalidad de los distintos entornos sonoros poseen características sonoras envolventes, que a su vez son captados por el ser humano desde su cualidad de escucha omnidireccional o de 360º. Jhon W. Strutt (Lord Rayleigh) propone en su trabajo “Teoría Duplex” que en el proceso de localización de una fuente sonora intervienen dos factores: las diferencias interaurales de nivel presión sonora y las diferencias de tiempos de arribo de las ondas sonoras. Estas diferencias ofrecen información en el plano horizontal (azimut) de las fuentes sonoras, pero no ofrecen información sobre su elevación, en particular si se ubican a medio camino entre ambos oídos. Investigaciones posteriores aportaron a indicar que también tienen influencias la anatomía humana en la percepción de la ubicación de la fuente sonora dando lugar a la llamada Función de Transferencia del oído (HRTF). La capacidad auditiva que permite asociar el sonido al espacio en el cual se propaga se denomina espacialidad. Este depende de la distancia que hay entre la fuente y el sonido, de las reflexiones tempranas generadas por las paredes u obstáculos que impiden que la onda se aleje de la fuente, de la reverberación producida por las reflexiones tardías del sonido y por último, del movimiento de la fuente. En este trabajo se analizan en forma sintética las claves perceptuales relacionadas a la direccionalidad y espacialidad del sonido en seres humanos. La importancia del conocimiento de las mismas permite generar y mejorar sistemas de audio con características tales que permiten una capacidad de distribución espacial de las fuentes creando una sensación de realismo.
Virtual Reality (VR) is on the edge of becoming the next major delivery plat- form for digital content. Technological progress has made accessible a variety of new tools, allowing development of the necessary hardware and software for VR... more
Virtual Reality (VR) is on the edge of becoming the next major delivery plat- form for digital content. Technological progress has made accessible a variety of new tools, allowing development of the necessary hardware and software for VR interaction. The importance of audio in this process is increasing. Audio can intensify the three dimensional experience and immersion of game players and users in other applications. This research focuses on determining and implementing the necessary ele- ments for a system able to deliver Virtual Auditory Display (VAD). The system structure is adapted to fit a new emerging gaming platform: smart- phones. Developing for mobile platforms requires consideration of many con- straints such as memory, processor load and development components. The result is a real-time dynamic VAD, manageable by mobile devices, and able to trigger spatial auditory perception across the azimuthal plane. In this study it is also shown how the VAD developed following custom imple- mentation of selected Head Related Transfer Function (HRTF), is able to generate azimuthal localization of sound events in VR scenarios.
This paper explores the limits of human localization of sound sources when listening with non-individual Head-Related Transfer Functions (HRTFs), by simulating performances of a localization task in the mid-sagittal plane. Computational... more
This paper explores the limits of human localization of sound sources when listening with non-individual Head-Related Transfer Functions (HRTFs), by simulating performances of a localization task in the mid-sagittal plane. Computational simulations are performed with the CIPIC HRTF database using two different auditory models which mimic human hearing processing from a functional point of view. Our methodology investigates the opportunity of using virtual experiments instead of time- and resource- demanding psychoacoustic tests, which could also lead to potentially unreliable results. Four different perceptual metrics were implemented in order to identify relevant differences between auditory models in a selection problem of best-available non-individual HRTFs. Results report a high correlation between the two models denoting an overall similar trend, however, we discuss discrepancies in the predictions which should be carefully considered for the applicability of our methodology to the HRTF selection problem.
Listener-selected HRTFs have the potential to provide the accuracy of an individualized HRTF without the time and resources required for HRTF measurements. This study tests listeners' HRTF preference for three different sets of... more
Listener-selected HRTFs have the potential to provide the accuracy of an individualized HRTF without the time and resources required for HRTF measurements. This study tests listeners' HRTF preference for three different sets of headphones. HRTF datasets heard ...
Measuring binaural room impulse responses (BRIRs) for different rooms and different persons is a costly and time-consuming task. In this paper, we propose a method that allows to compute BRIRs from a B-format room impulse response... more
Measuring binaural room impulse responses (BRIRs) for different rooms and different persons is a costly and time-consuming task. In this paper, we propose a method that allows to compute BRIRs from a B-format room impulse response (B-format RIR) and a set of head-related transfer functions (HRTFs). This enables to measure the room-related properties and head-related properties of BRIRs separately, reducing the amount of measurements necessary for obtaining BRIRs for different rooms and different persons to one B-format RIR measurement per room and one HRTF set per person. The BRIRs are modeled by applying an HRTF to the direct sound part of the B-format RIR and using a linear combination of the reflections part of the B-format RIR. The linear combination is determined such that the spectral and frequency-dependent interaural coherence cues match those of corresponding directly measured BRIRs. A subjective test indicates that the computed BRIRs are perceptually very similar to corresponding directly measured BRIRs.
This paper presents a multimodal interactive system for non-visual (auditory-haptic) exploration of virtual maps. The system is able to display haptically the height profile of a map, through a tactile mouse. Moreover, spatial auditory... more
This paper presents a multimodal interactive system for non-visual (auditory-haptic) exploration of virtual maps. The system is able to display haptically the height profile of a map, through a tactile mouse. Moreover, spatial auditory information is provided in the form of virtual anchor sounds located in specific points of the map, and delivered through headphones using customized Head-Related Transfer Functions (HRTFs). The validity of the proposed approach is investigated through two experiments on non-visual exploration of virtual maps. The first experiment has a preliminary nature and is aimed at assessing the effectiveness and the complementarity of auditory and haptic information in a goal reaching task. The second experiment investigates the potential of the system in providing subjects with spatial knowledge: specifically in helping with the construction of a cognitive map depicting simple geometrical objects. Results from both experiments show that the proposed concept, design, and implementation allow to effectively exploit the complementary natures of the " proximal " haptic modality and the " distal " auditory modality. Implications for orientation & mobility (O&M) protocols for visually impaired subjects are discussed.
In the framework of humanoid robotics it’s of great importance studying and developing computational tech- niques that enrich robot perception and its interaction with the surrounding environment. The most important cues for the... more
In the framework of humanoid robotics it’s of great importance studying and developing computational tech- niques that enrich robot perception and its interaction with the surrounding environment. The most important cues for the estimation of sound source azimuth are interaural phase differences (IPD), interaural time differ- ences (ITD) and interaural level differences (ILD) between the binaural signals. In this paper we present a method for the recognition of the direction of a sound located on the azimuthal plane (i.e. the plane containing the interaural axis). The proposed method is based on a spectrum weighted comparison between ILD’s and IPD’s extracted from microphones located at the ears and a set of stored cues; these cues where previously measured and stored in a database in the form of a Data Lookup Table. While the direct lookup in the table of the stored cues suffers from the presence of both ambient noise and reverberation, as usual in real envi- ronments, the proposed method, exploiting the overall shape of the actual frequency spectrum of the signal, both its phase and modulus, reduces dramatically errors in the localization. In the paper we give also the experimental evidence that such method improves greatly the usual HRTF based identification methods.
This letter documents the design and implementation of a tool which facilitates a new method of visualizing and sonifying Head-Related Transfer Function (HRTF) data sets. HRTF’s are empirically measured digital filters used in binaural... more
This letter documents the design and implementation of a tool which facilitates a new method of visualizing and sonifying Head-Related Transfer Function (HRTF) data sets. HRTF’s are empirically measured digital filters used in binaural spatialized audio systems which simulate the direction-dependant acoustic filtering properties of the human external ear. Because HRTF data sets are often reported as functions of three variables (azimuth angle, elevation angle, frequency), we suggest that three-dimensional, volumetric visualization tools can be effectively used to gain quick intuition about the structure of these large, complex data sets. Specifically, our visualization tool allows simultaneous visual and aural exploration of HRTF data by processing sounds with HRTF’s selected directly from volumetric displays of HRTF data.
Recent developments in immersive audio technology have motivated a proliferation of binaural renderers used for creating spatial audio content. Binaural renderers leverage psychoacoustic features of human hearing to reproduce a 3D sound... more
Recent developments in immersive audio technology have motivated a proliferation of binaural renderers used for creating spatial audio content. Binaural renderers leverage psychoacoustic features of human hearing to reproduce a 3D sound image over headphones. In this paper, a methodology for the comparative evaluation of different binaural renderers is presented. The methodological approach is threefold: a subjective evaluation of 1) quantitative characteristics (such as front/back and up/down discrimination and localization); 2) qualitative characteristics (such as naturalness and spaciousness); and 3) overall preference. The main objective of the methodology is to help to elucidate the most meaningful factors for the performance of binaural renderers and to provide insight on possible improvements in the rendering process.
Jenny, Claudia; Majdak, Piotr; Reuter, Christoph: SOFA Native Spatializer Plugin for Unity - Exchangeable HRTFs in Virtual Reality. In: Proceedings of the 144th Convention of the Audio Engineering Society, Milan, Italy (2018), Convention... more
Jenny, Claudia; Majdak, Piotr; Reuter, Christoph: SOFA Native Spatializer Plugin for Unity - Exchangeable HRTFs in Virtual Reality. In: Proceedings of the 144th Convention of the Audio Engineering Society, Milan, Italy (2018), Convention e-brief 406.
This Engineering Brief was selected on the basis of a submitted synopsis. The author is solely responsible for its presentation, and the AES takes no responsibility for the contents. All rights reserved. ABSTRACT In order to present three-dimensional virtual sound sources via headphones, head-related transfer functions (HRTFs) can be integrated in a spatialization algorithm. However, the spatial perception in binaural virtual acoustics may be limited if the applied HRTFs differ from those of the actual listener. Thus, SOFAlizer, a spatialization engine allowing to use and switch on-the-fly between listener-specific HRTFs stored in the spatially oriented format for acoustics (SOFA) was implemented for the Unity game engine. With that plugin, virtual-reality headsets can benefit from the individual HRTF-based spatial sound reproduction.
The paper evaluates the human directional resolution of virtual sound sources synthesised with the aid of a generalised head related impulse response (HRIR) library, i.e., an HRIR library measured using a dummy head and torso. The... more
The paper evaluates the human directional resolution of virtual sound sources synthesised with the aid of a generalised head related impulse response (HRIR) library, i.e., an HRIR library measured using a dummy head and torso. The original HRIR set is first expanded using linear interpolation, and then directional resolution measurements are performed for playback through headphones. These results are compared
The condition number of the matrix of electro-acoustic head-related transfer functions (HRTF) in a two-channel sound reproduction system has been used as a measure of robustness of the Atal-Schroeder (1962) crosstalk canceler. A... more
The condition number of the matrix of electro-acoustic head-related transfer functions (HRTF) in a two-channel sound reproduction system has been used as a measure of robustness of the Atal-Schroeder (1962) crosstalk canceler. A comparative study has been made using results produced by computer simulations and HRTFs measured in an anechoic chamber by means of a dummy head. It has been found that acoustic scattering by the head has a very important and beneficial influence on robustness, specially for large loudspeaker separations. For narrow loudspeaker separations of less than about 40 degrees it is found that crosstalk cancellation exhibits a large variation of alternating very low and very high robustness. Also, simulations and measurements have been made of the natural channel separation under the same conditions. Scattering by the head is seen to provide a good level of natural channel separation at high frequencies and large loudspeaker angles. At low frequencies or small loudspeaker angles natural channel separation is poor
This paper faces the general problem of modeling pinna-related transfer functions (PRTFs) for 3-D sound rendering. Following a structural approach, we aim at constructing a model for PRTF synthesis which allows to control separately the... more
This paper faces the general problem of modeling pinna-related transfer functions (PRTFs) for 3-D sound rendering. Following a structural approach, we aim at constructing a model for PRTF synthesis which allows to control separately the evolution of ear resonances and spectral notches through the design of two distinct filter blocks. Taking such model as endpoint, we propose a method based on the McAulay-Quatieri partial tracking algorithm to extract the frequencies of the most important spectral notches. Ray-tracing analysis performed on the so obtained tracks reveals a convincing correspondence between extracted frequencies and pinna geometry of a bunch of subjects.
Described in this paper is a method for the analysis and comparison of multi-speaker surround sound algorithms using HRTF data. Using Matlab and Simulink [1] a number of surround sound systems were modeled, both over multiple speakers... more
Described in this paper is a method for the analysis and comparison of multi-speaker surround sound algorithms using HRTF data. Using Matlab and Simulink [1] a number of surround sound systems were modeled, both over multiple speakers (for listening tests) and using the MIT ...
Mathematical model and a reverberation system implementations that emulates the form, in that the sound propagates in real form and how it influences in the veracity of the sound with the HRTF system of three-dimensional sound mix, for... more
Mathematical model and a reverberation system implementations that emulates the form, in that the sound propagates in real form and how it influences in the veracity of the sound with the HRTF system of three-dimensional sound mix, for this goal real measurements and simulations are used, first to process the depth sound and second for the modifications that this sound would experience until arriving at the eardrum.
Resumen Se presenta el proceso de desarrollo de un plug-in de audio en formato VST usando herramientas que facilitan su realización requiriendo solamente conocimientos básicos y generales de programación así como del tipo de procesos que... more
Resumen Se presenta el proceso de desarrollo de un plug-in de audio en formato VST usando herramientas que facilitan su realización requiriendo solamente conocimientos básicos y generales de programación así como del tipo de procesos que se desean implementar. Este plug-in permite realizar mezclas binaurales a partir de fuentes monoaurales que pueden ser ubicadas virtualmente en el plano horizontal con ángulos de acimut que varían entre los 90º a la izquierda y los 90º a la derecha. Las pistas binaurales obtenidas respetan las alteraciones producidas por la función de transferencia de la cabeza (HRTF). Palabras Clave: Plug-in de audio, mezclas binaurales, HRTF, procesamiento de señales. " Plug-in de audio para mezclas binaurales utilizando HRTF " Mezclas en el plano horizontal utilizando la función de transferencia de la cabeza (HRTF) El plug-in de audio que se presenta aquí ha sido producido en el marco de una beca de iniciación a la investigación del Consejo Interuniversitario Nacional, cuyo título es " Desarrollo de complemento de software de audio para mezclas binaurales con espacialidad ". Se incluyen entre los productos desarrollados un texto explicativo donde se detalla el proceso para crear y modificar un plug-in incluyendo la instalación y configuración de los distintos softwares, así como el modo de obtener las respuestas al impulso necesarias para la convolución, extraídas de la base de datos del IRCAM. Toda la información producida, incluyendo pistas de audio para probar su utilización, se encuentra disponible en una carpeta de Google Drive a la que se puede acceder mediante el siguiente enlace http://cor.to/LUOo. Tanto el plug-in como el código fuente se ponen a disposición bajo licencia de Creative Commons para su uso libre. Mezclas binaurales utilizando HRTF Se denomina mezcla de audio al proceso que toma una serie de registros de audio para generar un único producto sonoro. El proceso típico parte de registros monoaurales y obtiene un producto estéreo. La mezcla pretende, entre otras cosas, dar lugar a una distribución aparente de las fuentes sonoras incluidas en los registros monoaurales. Para lograr que una fuente monoaural sea percibida hacia la izquierda, por ejemplo, se genera una copia del archivo en ambos canales otorgando mayor nivel al canal izquierdo. Este método se conoce como paneo por nivel y fue propuesto por Alan Bloomlein a principios del siglo XX (Burns, 1999).
The incredible growth of extended reality (XR) applications will be leading us to a world beyond our imaginations in the coming decades. Extended reality is an umbrella term that encompasses different categories of immersive technologies... more
The incredible growth of extended reality (XR) applications will be leading us to a world beyond our imaginations in the coming decades. Extended reality is an umbrella term that encompasses different categories of immersive technologies like virtual reality (VR), augmented reality (AR), and mixed reality (MR). From the traditional applications like entertainment and training, XR has been spreading its wings into a large number of applications in health care, aerospace, product design and prototyping, e-commerce, workspace productivity, architecture, and building industries. Immersibility of the virtual reality scene into the physical world will be crucial for its acceptance by mainstream industries and future development. In addition to the virtual scene's visual perception, spatial audio is a key feature in designing truly immersive XR. Hearing is the fastest sense of humans, which makes virtual auditory display (VAD) an ineluctable part of any XR application. In this work, th...
Listener-selected HRTFs have the potential to provide the accuracy of an individualized HRTF without the time and resources required for HRTF measurements. This study tests listeners’ HRTF preference for three different sets of... more
Listener-selected HRTFs have the potential to provide the accuracy of an individualized HRTF without the time and resources required for HRTF measurements. This study tests listeners’ HRTF preference for three different sets of headphones. HRTF datasets heard over the noise-cancelling Bose Aviation headset were selected as having good externalization more often than those heard over Sennheiser HD650 open headphones or Sony MDR-7506 closed headphones. It is thought that the Bose headset’s frequency response is responsible for its superior externalization. This suggests that in systems where high quality headphones are not available, post-processing equalization should be applied to account for the effect of the headphones on HRTF reproduction.
This paper addresses the problem of modeling head-related transfer functions (HRTFs) for 3-D audio rendering in the front hemisphere. Following a structural approach, we build a model for real-time HRTF synthesis which allows to control... more
This paper addresses the problem of modeling head-related transfer functions (HRTFs) for 3-D audio rendering in the front hemisphere. Following a structural approach, we build a model for real-time HRTF synthesis which allows to control separately the evolution of different acoustic phenomena such as head diffraction, ear resonances, and reflections through the design of distinct filter blocks. Parameters to be fed to the model are both derived from mean spectral features in a collection of measured HRTFs and anthropometric features of the specific subject (taken from a photograph of his/her outer ear), hence allowing model customization. Visual analysis of the synthesized HRTFs reveals a convincing correspondence between original and reconstructed spectral features in the chosen spatial range. Furthermore, a possible experimental setup for dynamic psycho acoustical evaluation of such model is depicted.
Amplitude panning is the most common panning technique. Another method is time panning with a constant delay applied to one channel in stereophonic listening. Time panning is typically not used in stereo imaging, but it can be used when... more
Amplitude panning is the most common panning technique. Another method is time panning with a constant delay applied to one channel in stereophonic listening. Time panning is typically not used in stereo imaging, but it can be used when some special effects are created. The maximum interaural difference in time of arrival of air propagated signal is about 700 µs. Binaural hearing with headphones using that delay is not perceived as a virtual source at 90º of azimuth angle. However, including spectral modifications created by head diffraction, the azimuth angle can be perceived as it is expected. A simplified head diffraction model with possible applications in audio mixing is presented. The model is based on a bank of filters to emulate the angular position of the source in the horizontal plane. The aim of the model is to emulate a virtual position of the sound source with minimum computational effort when compared to the convolution with the corresponding head related impulse responses. Our results suggest than diffraction loss can be adequately represented by shelving filters. The CIPIC public database of head related impulse responses was used to compute the parameters of the simplified filter bank. Individuals with identical head diameter are selected from the database and their impulse responses are convolved with a series of one-third octave band noise in order to obtain for each ear a power profile in terms of the frequency. The results are averaged in order to smoothen the individual differences, thus neglecting other spectrum disturbances that do not correspond to diffraction. The model provides us with parameters about frequencies and attenuation level of the shelving filters.
This paper proposes a method to model Head-Related Transfer Functions (HRTFs) based on the shape and size of the outer ear. Using signal processing tools, such as Prony’s signal modeling method, a dynamic model of the pinna has been... more
This paper proposes a method to model Head-Related Transfer Functions (HRTFs) based on the shape and size of the outer ear. Using signal processing tools, such as Prony’s signal modeling method, a dynamic model of the pinna has been obtained, that completes the structural model of HRTFs used for digital audio spatialization. Listening tests conducted on 10 subjects showed that HRTFs created using this pinna model were 5% more effective than generic HRTFs in the frontal plane. This model has been able to reduce the computational and storage demands of audio spatialization, while preserving a sufficient number of perceptually relevant spectral cues.
One key issue in modeling head-related impulse responses (HRIRs) is how to individualize HRIRs model so that it is suitable for a listener. The objective of this research is to establish multiple regression models between minimum phase... more
One key issue in modeling head-related impulse responses (HRIRs) is how to individualize HRIRs model so that it is suitable for a listener. The objective of this research is to establish multiple regression models between minimum phase HRIRs and the anthropometric parameters in order to individualize a given listener's HRIRs with his or her own anthropometric parameters. We modeled the
The incredible growth of extended reality (XR) applications will be leading us to a world beyond our imaginations in the coming decades. Extended reality is an umbrella term that encompasses different categories of immersive technologies... more
This paper describes the different stages of creation of the CHEDAR database. It is comprised of 3D meshes generated from a morphable model of ear, head and torso and their associated diffuse-field equalised Head-Related Impulse Responses... more
This paper describes the different stages of creation of the CHEDAR database. It is comprised of 3D meshes generated from a morphable model of ear, head and torso and their associated diffuse-field equalised Head-Related Impulse Responses (HRIRs). Focused on the influence of ear, it provides 1253 different ear shapes, making it the largest available HRIR database so far. Moreover, 4 different evaluation grids are used for computation, with radii ranging from 20 cm to 2 m, thus allowing to study near field as well as far field. The frequency range of the corresponding HRTFs goes from 100 Hz to 16 kHz by 100 Hz steps. The HRIRs are provided as .sofa files. The entire database, the largest so far, and unique by the type of gathered data, the number of entries and their resolutions is publicly available for academic purposes.
In this paper, the experimental results of changes in directivity patterns of the artificial head affected by three different types of a head covers are presented. The mathematical model of a human head which is approximated by radially... more
In this paper, the experimental results of changes in directivity patterns of the artificial head affected by three different types of a head covers are presented. The mathematical model of a human head which is approximated by radially vibrating spherical cap set in a sphere is discussed. For the purpose of this research, the physical model of a human speaker head is constructed, and the far field detailed directivity patterns of the model with and without head covers are measured. The sound barriers such as hair as porous absorber, a cap and a straw hat, are discussed, as well as their influence on the sound wave propagation. Detailed directivity pattern changes affected by head covers in far field are calculated in step of 10 degrees for spherical coordinates (polar angle and azimuth) and presented in form of two-and three-dimensional polar plots.