Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Estimation of Room Acoustic Parameters: The ACE Challenge

Published: 01 October 2016 Publication History

Abstract

Reverberation time T60 and Direct-to-reverberant ratio DRR are important parameters which together can characterize sound captured by microphones in nonanechoic rooms. These parameters are important in speech processing applications such as speech recognition and dereverberation. The values of T60 and DRR can be estimated directly from the acoustic impulse response AIR of the room. In practice, the AIR is not normally available, in which case these parameters must be estimated blindly from the observed speech in the microphone signal. The acoustic characterization of environments ACE challenge aimed to determine the state-of-the-art in blind acoustic parameter estimation and also to stimulate research in this area. A summary of the ACE challenge, and the corpus used in the challenge is presented together with an analysis of the results. Existing algorithms were submitted alongside novel contributions, the comparative results for which are presented in this paper. The challenge showed that T60 estimation is a mature field where analytical approaches dominate whilst DRR estimation is a less mature field where machine learning approaches are currently more successful.

References

[1]
K. Lebart, J. M. Boucher, and P. N. Denbigh, "A new method based on spectral subtraction for speech de-reverberation," Acta Acoustica, vol. 87, pp. 359-366, 2001.
[2]
E. A. P. Habets, "Single - and multi-microphone speech dereverberation using spectral enhancement," Ph.D. dissertation, Technische Universiteit Eindhoven (TU/e), Eindhoven, The Netherlands, 2007.
[3]
X. S. Lin, A. W. H. Khong, and P. A. Naylor, "A forced spectral diversity algorithm for speech dereverberation in the presence of near-common zeros," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 3, pp. 888-899, Mar. 2012.
[4]
B. Cauchi et al., "Joint dereverberation and noise reduction using beam-forming and a single-channel speech enhancement scheme," in Proc. REVERB Challenge Workshop, 2014, vol. 1, pp. 1-8.
[5]
L. Couvreur and C. Couvreur, "On the use of artificial reverberation for ASR in highly reverberant environments," in Proc. IEEE Benelux Signal Process. Symp., Hilvarenbeek, The Netherlands, Mar. 2000, no. 2, pp. S001-S004.
[6]
T. Fukumori, M. Nakayama, T. Nishiura, and Y. Yamashita, "Estimation of speech recognition performance in noisy and reverberant environments using PESQ score and acoustic parameters," in Proc. Asia-Pacific Signal Inf. Process. Assoc. Annu. Summit Conf., Oct. 2013, pp. 1-4.
[7]
P. P. Parada, D. Sharma, and P. A. Naylor, "Non-intrusive estimation of the level of reverberation in speech," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2014, pp. 4718-4722.
[8]
J. Liu and G.-Z. Yang, "Robust speech recognition in reverberant environments by using an optimal synthetic room impulse response model," Speech, Commun., vol. 67, pp. 65-77, 2015.
[9]
F. Xiong et al., "Front-end technologies for robust ASR in reverberant environments-spectral enhancement-based dereverberation and auditory modulation filterbank features," EURASIP J. Adv. Signal Process., vol. 70, no. 1, pp. 1-18, 2015.
[10]
P. P. Parada, D. Sharma, P. A. Naylor, and T. van Waterschoot, "Reverberant speech recognition exploiting clarity index estimation," EURASIP J. Adv. Signal Process., vol. 2015, no. 1, pp. 1-12, 2015.
[11]
H. Kuttruff, Room Acoustics. 4th ed. London, U.K.: Taylor & Francis, 2000.
[12]
P. A. Naylor and N. D. Gaubitch, Eds., Speech, Dereverberation. New York, NY, USA: Springer, 2010.
[13]
M. Karjalainen, P. Antsalo, A. Mäkivirta, T. Peltonen, and V. Välimäki, "Estimation of modal decay parameters from noisy response measurements," J. Audio Eng. Soc., vol. 11, pp. 867-878, 2002.
[14]
P. A. Naylor and N. D. Gaubitch, "Acoustic signal processing in noise: It's not getting any quieter," in Proc. Int. Workshop Acoust. Signal Enhancement, Aachen, Germany, 2012, pp. 1-6.
[15]
N. D. Gaubitch et al., "Performance comparison of algorithms for blind reverberation time estimation from speech," in Proc. Int. Workshop Acoust. Signal Enhancement, Aachen, Germany, Sep. 2012, pp. 1-4.
[16]
J. Y. C. Wen, E. A. P. Habets, and P. A. Naylor, "Blind estimation of reverberation time based on the distribution of signal decay rates," in Proc. IEEE Intl. Conf. Acoust., Speech, Signal Process., Las Vegas, NV, USA, Apr. 2008, pp. 329-332.
[17]
H. W. Löllmann, E. Yilmaz, M. Jeub, and P. Vary, "An improved algorithm for blind reverberation time estimation," in Proc. Int. Workshop Acoust. Echo Noise Control, Tel-Aviv, Israel, Aug. 2010, pp. 1-4.
[18]
T. H. Falk, C. Zheng, and W.-Y. Chan, "A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp. 1766-1774, Sep. 2010.
[19]
J. Eaton, N. D. Gaubitch, and P. A. Naylor, "Noise-robust reverberation time estimation using spectral decay distributions with reduced computational cost," in Proc. IEEE Intl. Conf. Acoust., Speech, Signal Process., Vancouver, BC, Canada, May 2013, pp. 161-165.
[20]
R. Talmon and E. A. P. Habets, "Blind reverberation time estimation by intrinsic modeling of reverberant speech," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2013, pp. 156-160.
[21]
F. Xiong, S. Goetze, and B. T. Meyer, "Blind estimation of reverberation time based on spectro-temporal modulation filtering," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 443-447.
[22]
C. Schuldt and P. Handel, "Blind low-complexity estimation of reverberation time," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, NY, USA, Oct. 2013, pp. 1-4.
[23]
B. Dumortier and E. Vincent, "Blind RT60 estimation robust across room sizes and source distances," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2014, pp. 5187-5191.
[24]
M. Jeub, C. Nelke, C. Beaugeant, and P. Vary, "Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals," in Proc. Eur. Signal Process. Conf., Barcelona, Spain, 2011, pp. 1347-1351.
[25]
Y. Hioka, K. Niwa, S. Sakauchi, K. Furuya, and Y. Haneda, "Estimating direct-to-reverberant energy ratio using D/R spatial correlation matrix model," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 8, pp. 2374-2384, Nov. 2011.
[26]
Y. Hioka, K. Furuya, K. Niwa, and Y. Haneda, "Estimation of direct-to-reverberation energy ratio based on isotropic and homogeneous propagation model," in Proc. Int. Workshop Acoust. Signal Enhancement, Sep. 2012, pp. 1-4.
[27]
J. Eaton, A. H. Moore, P. A. Naylor, and J. Skoglund, "Direct-to-reverberant ratio estimation using a null-steered beamformer," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Brisbane, Australia, Apr. 2015, pp. 46-50.
[28]
Audience, "Audience eS800 series," Audience, Product brief, 2014. [Online]. Available: http://www.audience.com/images/AUD_eS800_ProdBrief_110414.pdf
[29]
A. Varga and H. J. M. Steeneken, "Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech, Commun., vol. 3, no. 3, pp. 247-251, Jul. 1993.
[30]
H. G. Hirsch, "The simulation of realistic acoustic input scenarios for speech recognition systems," in Proc. Conf. Int. Speech, Commun. Assoc., Lisboa, Portugal, Sep. 2005, pp. 2697-2700.
[31]
J. Y. C. Wen, N. D. Gaubitch, E. Habets, T. Myatt, and P. A. Naylor, "Evaluation of speech dereverberation algorithms using the MARDY database," in Proc. Int. Workshop Acoust. Echo Noise Control, Paris, France, Sep. 2006, pp. 1-4.
[32]
S. Shelley and D. Murphy, "OpenAIR: An interactive auralization web resource and database," presented at the Audio Engineering Society Convention, San Francisco, CA, USA, Nov. 2010.
[33]
J. Y. C. Wen and P. A. Naylor, "Semantic coloration space investigation: Controlled coloration in the bark-sone domains," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, NY, USA, Oct. 2007, pp. 311-314.
[34]
K. Kinoshita et al., "The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., Oct. 2013, pp. 1-4.
[35]
L. Cristoforetti et al., "The DIRHA simulated corpus," in Proc. Int. Conf. Lang. Resources Eval., May 2014, pp. 2629-2634.
[36]
J. Eaton, N. D. Gaubitch, A. H. Moore, and P. A. Naylor, "The ACE Challenge - corpus description and performance evaluation," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust, New Paltz, NY, USA, 2015.
[37]
Acoustics - Preferred frequencies, International Organization for Standardization (ISO) Recommendation ISO-266, Mar. 1997.
[38]
Objective Measurement of Active Speech. Level, International Telecommunications Union (ITU-T) Recommendation P.56, Mar. 1993.
[39]
Loudness Recommendation, European Broadcasting Union (EBU) Recommendation R128-2014, 2014.
[40]
J. Eaton, N. D. Gaubitch, A. H. Moore, and P. A. Naylor. (2016). ACE Challenge results technical report. Imperial College London, Tech. Rep.. [Online]. Available: https://www.researchgate.net/publication/303736406_Acoustic_Characteriz ation_of_Environments_ACE_Challenge_Results_Technical_Report
[41]
J. S. Garofolo, "Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database," Nat. Inst. of Standards Technol., Gaithersburg, MD, USA, Tech. Rep., Dec. 1988.
[42]
D. M. Brookes VOICEBOX: A speech processing toolbox for MATLAB. [Online]. Available: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html, 1997-2016.
[43]
A. Farina, "Simultaneous measurement of impulse response and distortion with a swept-sine technique," in Proc. Audio Eng. Soc. Convention, Feb. 2000, no. 108, pp. 1-23.
[44]
A. Farina, "Advancements in impulse response measurements by sine sweeps," in Proc. Audio Eng. Soc. Convention, Vienna, Austria, May 2007, no. 123, pp. 1-21.
[45]
Acoustics - Measurement of the Reverberation Time of Rooms With Reference to Other Acoustical Parameters. International Organization for Standardization Recommendation ISO-3382, May 2009.
[46]
S. Mosayyebpour, H. Sheikhzadeh, T. Gulliver, and M. Esmaeili, "Single-microphone LP residual skewness-based inverse filtering of the room impulse response," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 5, pp. 1617-1632, Jul. 2012.
[47]
P. P. Parada, D. Sharma, T. van Waterschoot, and P. A. Naylor, "Evaluating the non-intrusive room acoustics algorithm with the ACE challenge," in Proc. ACE Challenge Workshop, Satellite IEEE, New Paltz, NY, USA, 2015.
[48]
T. de M. Prego, A. A. de Lima, R. Zambrano-López, and S. L. Netto, "Blind estimators for reverberation time and direct-to-reverberant energy ratio using subband speech decomposition," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, NY, USA, 2015.
[49]
H. W. Löllmann, A. Brendel, P. Vary, and W. Kellermann, "Single-channel maximum-likelihood T60 estimation exploiting subband information," in Proc. ACE Challenge Workshop, Satellite IEEE, New Paltz, NY, USA, 2015.
[50]
J. Eaton and P. A. Naylor, "Reverberation time estimation on the ACE corpus using the SDD method," in Proc. ACE Challenge Workshop Satellite IEEE, New Paltz, NY, USA, 2015.
[51]
F. Xiong, S. Goetze, and B. T. Meyer, "Joint estimation of reverberation time and direct-to-reverberation ratio from speech using auditory inspired features," in Proc. ACE Challenge Workshop, Satellite IEEE, New Paltz, NY, USA, 2015.
[52]
M. Senoussaoui, J. F. Santos, and T. H. Falk, "SRMR variants for improved blind room acoustics characterization," in Proc. IEEE ACE Challenge Workshop, Satellite, New Paltz, NY, USA, 2015.
[53]
J. F. Santos, M. Senoussaoui, and T. H. Falk, "An improved non-intrusive intelligibility metric for noisy and reverberant speech," in Proc. Int. Workshop Acoust. Signal Enhancement, Sep. 2014, pp. 55-59.
[54]
F. Lim, M. R. P. Thomas, and I. J. Tashev, "Blur kernel estimation approach to blind reverberation time estimation," in Proc. IEEE Intl. Conf. Acoust., Speech, Signal Process., Brisbane, Australia, Apr. 2015, pp. 41-45.
[55]
F. Lim, M. R. P. Thomas, P. A. Naylor, and I. J. Tashev, "Acoustic blur kernel with sliding window for blind estimation of reverberation time," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, NY, USA, 2015.
[56]
Y. Hioka and K. Niwa, "PSD estimation in beamspace for estimating direct-to-reverberant ratio from a reverberant speech signal," in Proc. ACE Challenge Workshop, Satellite IEEE, New Paltz, NY, USA, 2015.
[57]
H. Chen, P. N. Samarasinghe, T. D. Abhayapala, and W. Zhang, "Estimation of the direct-to-reverberant energy ratio using a spherical microphone array," in Proc. IEEE ACE Challenge Workshop, Satellite, New Paltz, NY, USA, 2015.
[58]
J. Eaton and P. A. Naylor, "Direct-to-reverberant ratio estimation on the ACE corpus using a two-channel beamformer," in Proc. IEEE ACE Challenge Workshop, Satellite, New Paltz, NY, USA, 2015.
[59]
T. H. Falk and W.-Y. Chan, "Temporal dynamics for blind measurement of room acoustical parameters," IEEE Trans. Instrum. Meas., vol. 59, no. 4, pp. 978-989, Apr. 2010.

Cited By

View all
  • (2024)MIRACLE—a microphone array impulse response dataset for acoustic learningEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00352-82024:1Online publication date: 18-Jun-2024
  • (2024)Exploring the power of pure attention mechanisms in blind room parameter estimationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00344-82024:1Online publication date: 24-Apr-2024
  • (2024)An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environmentEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00338-62024:1Online publication date: 27-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Audio, Speech and Language Processing
IEEE/ACM Transactions on Audio, Speech and Language Processing  Volume 24, Issue 10
October 2016
195 pages
ISSN:2329-9290
EISSN:2329-9304
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 October 2016
Published in TASLP Volume 24, Issue 10

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)2
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MIRACLE—a microphone array impulse response dataset for acoustic learningEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00352-82024:1Online publication date: 18-Jun-2024
  • (2024)Exploring the power of pure attention mechanisms in blind room parameter estimationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00344-82024:1Online publication date: 24-Apr-2024
  • (2024)An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environmentEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00338-62024:1Online publication date: 27-Mar-2024
  • (2024)PAN-AR: A Multimodal Dataset of Higher-Order Ambisonics Room Impulse Responses, Ambient Noise and Spherical PicturesProceedings of the 19th International Audio Mostly Conference: Explorations in Sonic Cultures10.1145/3678299.3678332(332-340)Online publication date: 18-Sep-2024
  • (2023)MYRiAD: a multi-array room acoustic databaseEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-023-00284-92023:1Online publication date: 26-Apr-2023
  • (2023)Co-Immersion in Audio Augmented Virtuality: The Case Study of a Static and Approximated Late Reverberation AlgorithmIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332021329:11(4472-4482)Online publication date: 1-Nov-2023
  • (2022)Wake Word Based Room Identification with Personal Voice AssistantsProceedings of the 2022 INTERNATIONAL CONFERENCE ON EMBEDDED WIRELESS SYSTEMS AND NETWORKS10.5555/3578948.3578990(262-267)Online publication date: 2-Dec-2022
  • (2022)Towards Unconstrained Audio Splicing Detection and Localization with Neural NetworksPattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges10.1007/978-3-031-37742-6_22(264-280)Online publication date: 21-Aug-2022
  • (2021)dEchorate: a calibrated room impulse response dataset for echo-aware signal processingEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-021-00229-02021:1Online publication date: 23-Nov-2021
  • (2021)Room Identification with Personal Voice Assistants (Extended Abstract)Computer Security. ESORICS 2021 International Workshops10.1007/978-3-030-95484-0_19(317-327)Online publication date: 4-Oct-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media