AQP: an open modular Python platform for objective speech and audio quality metrics
Pages 191 - 196
Abstract
Audio quality assessment has been widely researched in the signal processing area. Full-reference objective metrics (e.g., POLQA, ViSQOL) have been developed to estimate the audio quality relying only on human rating experiments. To evaluate the audio quality of novel audio processing techniques, researchers constantly need to compare objective quality metrics. Testing different implementations of the same metric and evaluating new datasets are fundamental and ongoing iterative activities. In this paper, we present AQP - an open-source, node-based, light-weight Python pipeline for audio quality assessment. AQP allows researchers to test and compare objective quality metrics helping to improve robustness, reproducibility and development speed. We introduce the platform, explain the motivations, and illustrate with examples how, using AQP, objective quality metrics can be (i) compared and benchmarked; (ii) prototyped and adapted in a modular fashion; (iii) visualised and checked for errors. The code has been shared on GitHub to encourage adoption and contributions from the community.
Supplementary Material
Supplemental material.
- Download
- 4.19 MB
References
[1]
Jan Skoglund and Jean-Marc Valin. 2020. Improving Opus low bit rate quality with neural speech synthesis. In Proc. Interspeech 2020, 2847--2851.
[2]
Andrew Hines and Naomi Harte. 2012. Speech intelligibility prediction using a neurogram similarity index measure. Speech Communication, 54, 2, 306--320.
[3]
Andrew Hines, Jan Skoglund, Anil C Kokaram, and Naomi Harte. 2015. ViSQOL: An objective speech quality model. EURASIP Journal on Audio, Speech, and Music Processing, 2015, 1, 13.
[4]
Antony W Rix, John G Beerends, Michael P Hollier, and Andries P Hekstra. 2001. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Volume 2. IEEE, 749--752.
[5]
John G Beerends, Christian Schmidmer, Jens Berger, Matthias Obermann, Raphael Ullmann, Joachim Pomy, and Michael Keyhl. 2013. Perceptual objective listening quality assessment (POLQA), the third generation ITU-T standard for end-to-end speech quality measurement part I---Temporal alignment. Journal of the Audio Engineering Society, 61, 6, 366--384.
[6]
Thilo Thiede, William C Treurniet, Roland Bitto, Christian Schmidmer, Thomas Sporer, John G Beerends, and Catherine Colomes. 2000. PEAQ-The ITU standard for objective measurement of perceived audio quality. Journal of the Audio Engineering Society, 48, 1/2, 3--29.
[7]
Michael Chinen, Felicia SC Lim, Jan Skoglund, Nikita Gureev, Feargus O'Gorman, and Andrew Hines. 2020. ViSQOL v3: an open source production ready objective speech and audio metric. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1--6.
[8]
Joan Serrà, Jordi Pons, and Santiago Pascual. 2021. SESQA: semi-supervised learning for speech quality assessment. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 381--385.
[9]
Gabriel Mittag, Babak Naderi, Assmaa Chehadi, and Sebastian Möller. 2021. NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowd-sourced Datasets. In Proc. Interspeech 2021, 2127--2131.
[10]
Alessandro Ragano, Emmanouil Benetos, and Andrew Hines. 2021. More for less: non-intrusive speech quality assessment with limited annotations. In 2021 13th International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 103--108.
[11]
Pranay Manocha, Zeyu Jin, Richard Zhang, and Adam Finkelstein. 2021. Cdpam: contrastive learning for perceptual audio similarity. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 196--200.
[12]
W. A. Jassim, J. Skoglund, M. Chinen, and A. Hines. 2020. Speech quality factors for traditional and neural-based low bit rate vocoders. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), 1--6.
[13]
Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. Librosa: audio and music signal analysis in python. In Proceedings of the 14th python in science conference. Volume 8.
[14]
Virtanen et al. and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17, 261--272.
[15]
Adam Paszke et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, 8024--8035.
[16]
The pandas development team. 2020. Pandas-dev/pandas: pandas. Version latest. (February 2020). https://doi.org/10.5281/zenodo.3509134.
[17]
J. D. Hunter. 2007. Matplotlib: a 2d graphics environment. Computing in Science & Engineering, 9, 3, 90--95.
[18]
Wissam A Jassim, Jan Skoglund, Michael Chinen, and Andrew Hines. 2021. Warp-q: quality prediction for generative neural speech codecs. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 401--405.
[19]
Stephen Voran. 2021. Full-reference and no-reference objective evaluation of deep neural network speech. In 2021 13th International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 85--90.
[20]
Colm Sloan, Naomi Harte, Damien Kelly, Anil C Kokaram, and Andrew Hines. 2017. Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio. IEEE Transactions on Broadcasting, 63, 4, 693--705.
[21]
Pablo M Delgado and Jürgen Herre. 2020. Can We Still Use PEAQ? A Performance Analysis of the ITU Standard for the Objective Assessment of Perceived Audio Quality. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1--6.
[22]
Yonina C Eldar et al. 2017. Challenges and open problems in signal processing: Panel discussion summary from ICASSP 2017. IEEE Signal Processing Magazine, 34, 6, 8--23.
[23]
Charles R. Harris et al. 2020. Array programming with NumPy. Nature, 585, 7825, (September 2020), 357--362. https://doi.org/10.1038/s41586-020-2649-2.
[24]
Sebastian Möller, Wai-Yip Chan, Nicolas Côté, Tiago H Falk, Alexander Raake, and Marcel Wältermann. 2011. Speech quality estimation: Models and trends. IEEE Signal Processing Magazine, 28, 6, 18--28.
[25]
F. Pedregosa et al. 2011. Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825--2830.
[26]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: state-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, (October 2020), 38--45. https://www.aclweb.org/anthology/2020.emnlp-demos.6.
[27]
G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools.
[28]
Elisabeth Freeman, Eric Freeman, Bert Bates, and Kathy Sierra. 2004. Head First Design Patterns. O' Reilly & Associates, Inc. isbn: 0596007124.
[29]
Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Longman Publishing Co., Inc., USA. isbn: 0201633612.
[30]
P. Vandewalle, G. Barrenetxea, I. Jovanovic, A. Ridolfi, and M. Vetterli. 2007. Experiences with reproducible research in various facets of signal processing research. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Volume 4, IV-1253--IV-1256.
[31]
2022. Artifact Review and Badging - Version 1.0 (not current). [Online; accessed 20. Apr. 2022]. (April 2022). https://www.acm.org/publications/policies/artifact-review-badging.
[32]
Thomas Wolf et al. 2020. Transformers: state-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, (October 2020), 38--45. https://aclanthology.org/2020.emnlp-demos.6.
[33]
Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Exploring network structure, dynamics, and function using networkx. In Proceedings of the 7th Python in Science Conference. Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors. Pasadena, CA USA, 11--15.
[34]
ludlows. 2022. python-pesq. [Online; accessed 20. Apr. 2022]. (April 2022). https://github.com/ludlows/python-pesq.
[35]
Michael Chinen, Jan Skoglund, and Andrew Hines. 2021. Speech quality estimation with deep lattice networks. The Journal of the Acoustical Society of America, 149, 6, 3851--3861.
[36]
Miroslaw Narbutt, Andrew Allen, Jan Skoglund, Michael Chinen, and Andrew Hines. 2018. Ambiqual-a full reference objective quality metric for ambisonic spatial audio. In 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1--6.
[37]
Helard Martinez, Mylène C.Q. Farias, and Andrew Hines. 2019. Navidad: a no-reference audio-visual quality metric based on a deep autoencoder. In 2019 27th European Signal Processing Conference (EUSIPCO), 1--5.
Information & Contributors
Information
Published In
June 2022
432 pages
ISBN:9781450392839
DOI:10.1145/3524273
- General Chairs:
- Niall Murray,
- Gwendal Simon,
- Mylene Farias,
- Program Chairs:
- Irene Viola,
- Mario Montagud
Copyright © 2022 Owner/Author.
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 05 August 2022
Check for updates
Qualifiers
- Research-article
Funding Sources
Conference
MMSys '22
Sponsor:
Acceptance Rates
Overall Acceptance Rate 176 of 530 submissions, 33%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 400Total Downloads
- Downloads (Last 12 months)145
- Downloads (Last 6 weeks)31
Reflects downloads up to 16 Jan 2025
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in