research-article

Open access

AQP: an open modular Python platform for objective speech and audio quality metrics

Authors:

Alessandro Ragano,

Andrew HinesAuthors Info & Claims

MMSys '22: Proceedings of the 13th ACM Multimedia Systems Conference

Pages 191 - 196

https://doi.org/10.1145/3524273.3532885

Published: 05 August 2022 Publication History

Abstract

Audio quality assessment has been widely researched in the signal processing area. Full-reference objective metrics (e.g., POLQA, ViSQOL) have been developed to estimate the audio quality relying only on human rating experiments. To evaluate the audio quality of novel audio processing techniques, researchers constantly need to compare objective quality metrics. Testing different implementations of the same metric and evaluating new datasets are fundamental and ongoing iterative activities. In this paper, we present AQP - an open-source, node-based, light-weight Python pipeline for audio quality assessment. AQP allows researchers to test and compare objective quality metrics helping to improve robustness, reproducibility and development speed. We introduce the platform, explain the motivations, and illustrate with examples how, using AQP, objective quality metrics can be (i) compared and benchmarked; (ii) prototyped and adapted in a modular fashion; (iii) visualised and checked for errors. The code has been shared on GitHub to encourage adoption and contributions from the community.

Supplementary Material

ZIP File (p191-geraghty.zip)

Supplemental material.

Download
4.19 MB

References

[1]

Jan Skoglund and Jean-Marc Valin. 2020. Improving Opus low bit rate quality with neural speech synthesis. In Proc. Interspeech 2020, 2847--2851.

[2]

Andrew Hines and Naomi Harte. 2012. Speech intelligibility prediction using a neurogram similarity index measure. Speech Communication, 54, 2, 306--320.

Digital Library

[3]

Andrew Hines, Jan Skoglund, Anil C Kokaram, and Naomi Harte. 2015. ViSQOL: An objective speech quality model. EURASIP Journal on Audio, Speech, and Music Processing, 2015, 1, 13.

[4]

Antony W Rix, John G Beerends, Michael P Hollier, and Andries P Hekstra. 2001. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Volume 2. IEEE, 749--752.

Digital Library

[5]

John G Beerends, Christian Schmidmer, Jens Berger, Matthias Obermann, Raphael Ullmann, Joachim Pomy, and Michael Keyhl. 2013. Perceptual objective listening quality assessment (POLQA), the third generation ITU-T standard for end-to-end speech quality measurement part I---Temporal alignment. Journal of the Audio Engineering Society, 61, 6, 366--384.

[6]

Thilo Thiede, William C Treurniet, Roland Bitto, Christian Schmidmer, Thomas Sporer, John G Beerends, and Catherine Colomes. 2000. PEAQ-The ITU standard for objective measurement of perceived audio quality. Journal of the Audio Engineering Society, 48, 1/2, 3--29.

[7]

Michael Chinen, Felicia SC Lim, Jan Skoglund, Nikita Gureev, Feargus O'Gorman, and Andrew Hines. 2020. ViSQOL v3: an open source production ready objective speech and audio metric. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1--6.

[8]

Joan Serrà, Jordi Pons, and Santiago Pascual. 2021. SESQA: semi-supervised learning for speech quality assessment. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 381--385.

[9]

Gabriel Mittag, Babak Naderi, Assmaa Chehadi, and Sebastian Möller. 2021. NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowd-sourced Datasets. In Proc. Interspeech 2021, 2127--2131.

[10]

Alessandro Ragano, Emmanouil Benetos, and Andrew Hines. 2021. More for less: non-intrusive speech quality assessment with limited annotations. In 2021 13th International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 103--108.

[11]

Pranay Manocha, Zeyu Jin, Richard Zhang, and Adam Finkelstein. 2021. Cdpam: contrastive learning for perceptual audio similarity. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 196--200.

[12]

W. A. Jassim, J. Skoglund, M. Chinen, and A. Hines. 2020. Speech quality factors for traditional and neural-based low bit rate vocoders. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), 1--6.

[13]

Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. Librosa: audio and music signal analysis in python. In Proceedings of the 14th python in science conference. Volume 8.

[14]

Virtanen et al. and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17, 261--272.

[15]

Adam Paszke et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, 8024--8035.

[16]

The pandas development team. 2020. Pandas-dev/pandas: pandas. Version latest. (February 2020). https://doi.org/10.5281/zenodo.3509134.

[17]

J. D. Hunter. 2007. Matplotlib: a 2d graphics environment. Computing in Science & Engineering, 9, 3, 90--95.

Digital Library

[18]

Wissam A Jassim, Jan Skoglund, Michael Chinen, and Andrew Hines. 2021. Warp-q: quality prediction for generative neural speech codecs. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 401--405.

[19]

Stephen Voran. 2021. Full-reference and no-reference objective evaluation of deep neural network speech. In 2021 13th International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 85--90.

[20]

Colm Sloan, Naomi Harte, Damien Kelly, Anil C Kokaram, and Andrew Hines. 2017. Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio. IEEE Transactions on Broadcasting, 63, 4, 693--705.

[21]

Pablo M Delgado and Jürgen Herre. 2020. Can We Still Use PEAQ? A Performance Analysis of the ITU Standard for the Objective Assessment of Perceived Audio Quality. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1--6.

[22]

Yonina C Eldar et al. 2017. Challenges and open problems in signal processing: Panel discussion summary from ICASSP 2017. IEEE Signal Processing Magazine, 34, 6, 8--23.

[23]

Charles R. Harris et al. 2020. Array programming with NumPy. Nature, 585, 7825, (September 2020), 357--362. https://doi.org/10.1038/s41586-020-2649-2.

[24]

Sebastian Möller, Wai-Yip Chan, Nicolas Côté, Tiago H Falk, Alexander Raake, and Marcel Wältermann. 2011. Speech quality estimation: Models and trends. IEEE Signal Processing Magazine, 28, 6, 18--28.

[25]

F. Pedregosa et al. 2011. Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825--2830.

Digital Library

[26]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: state-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, (October 2020), 38--45. https://www.aclweb.org/anthology/2020.emnlp-demos.6.

[27]

G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools.

[28]

Elisabeth Freeman, Eric Freeman, Bert Bates, and Kathy Sierra. 2004. Head First Design Patterns. O' Reilly & Associates, Inc. isbn: 0596007124.

Digital Library

[29]

Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Longman Publishing Co., Inc., USA. isbn: 0201633612.

Digital Library

[30]

P. Vandewalle, G. Barrenetxea, I. Jovanovic, A. Ridolfi, and M. Vetterli. 2007. Experiences with reproducible research in various facets of signal processing research. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Volume 4, IV-1253--IV-1256.

[31]

2022. Artifact Review and Badging - Version 1.0 (not current). [Online; accessed 20. Apr. 2022]. (April 2022). https://www.acm.org/publications/policies/artifact-review-badging.

[32]

Thomas Wolf et al. 2020. Transformers: state-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, (October 2020), 38--45. https://aclanthology.org/2020.emnlp-demos.6.

[33]

Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Exploring network structure, dynamics, and function using networkx. In Proceedings of the 7th Python in Science Conference. Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors. Pasadena, CA USA, 11--15.

[34]

ludlows. 2022. python-pesq. [Online; accessed 20. Apr. 2022]. (April 2022). https://github.com/ludlows/python-pesq.

[35]

Michael Chinen, Jan Skoglund, and Andrew Hines. 2021. Speech quality estimation with deep lattice networks. The Journal of the Acoustical Society of America, 149, 6, 3851--3861.

[36]

Miroslaw Narbutt, Andrew Allen, Jan Skoglund, Michael Chinen, and Andrew Hines. 2018. Ambiqual-a full reference objective quality metric for ambisonic spatial audio. In 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1--6.

[37]

Helard Martinez, Mylène C.Q. Farias, and Andrew Hines. 2019. Navidad: a no-reference audio-visual quality metric based on a deep autoencoder. In 2019 27th European Signal Processing Conference (EUSIPCO), 1--5.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMSys '22: Proceedings of the 13th ACM Multimedia Systems Conference

June 2022

432 pages

ISBN:9781450392839

DOI:10.1145/3524273

General Chairs:
Niall Murray
Technological University of the Shannon: Midlands Midwest
,
Gwendal Simon
Synamedia
,
Mylene Farias
University of Brasilia
,
Program Chairs:
Irene Viola
Centrum Wiskunde & Informatica
,
Mario Montagud
i2CAT Foundation & University of Valencia

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2022

Check for updates

Qualifiers

Research-article

Funding Sources

Science Foundation Ireland

Conference

MMSys '22

Sponsor:

SIGMM

MMSys '22: 13th ACM Multimedia Systems Conference

June 14 - 17, 2022

Athlone, Ireland

Acceptance Rates

Overall Acceptance Rate 176 of 530 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
400
Total Downloads

Downloads (Last 12 months)145
Downloads (Last 6 weeks)31

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents