Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3678299.3678325acmotherconferencesArticle/Chapter ViewAbstractPublication PagesamConference Proceedingsconference-collections
research-article
Open access

On the Importance of Temporally Precise Onset Annotations for Real-Time Music Information Retrieval: Findings from the AG-PT-set Dataset

Published: 18 September 2024 Publication History

Abstract

In real-time Music Information Retrieval (MIR), small analysis windows are essential for achieving low retrieval latency. In turn, event-based real-time MIR methods require precise onset detectors to correctly align with the beginning of events such as musical notes. Detectors are typically trained using ground-truth annotations from datasets of interest. Yet, most MIR datasets do not prioritize the accurate timing of onset labels, and the evaluation of detectors often relies on generous tolerance windows (even ±50ms). In this paper we present AG-PT-set, a new dataset of acoustic guitar techniques with precise onset annotations. The dataset features 32,592 individual notes and over 10 hours of audio, covering eight techniques. Moreover, we assess the importance of exact onset labels across multiple real-time MIR tasks. Our results show how accurate timing of onset labels and precise detectors are crucial for real-time MIR tasks, as the performance of most algorithms degrades with imprecise onsets. In few occasions, imprecise onset timing slightly improved results, hinting at a possible similarity to data augmentation methods. Taken together, our findings indicate that temporally precise labels and detectors are always preferable, as robustness can always be obtained via artificial augmentation, while precision cannot be obtained as easily

References

[1]
Juan Pablo Bello, Laurent Daudet, Samer Abdallah, Chris Duxbury, Mike Davies, and Mark B Sandler. 2005. A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing 13 (2005), 1035–1047.
[2]
Juan Pablo Bello, Chris Duxbury, Mike Davies, and Mark Sandler. 2004. On the use of phase and energy for musical onset detection in the complex domain. IEEE Signal Processing Letters 11, 6 (2004), 553–556.
[3]
Juan Pablo Bello and Mark Sandler. 2003. Phase-based note onset detection for music signals. 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03) 5 (2003), V–441.
[4]
Sebastian Böck, Florian Krebs, and Markus Schedl. 2012. Evaluating the Online Capabilities of Onset Detection Methods. In ISMIR. 49–54.
[5]
Sebastian Böck and Markus Schedl. 2011. Enhanced beat tracking with context-aware neural networks. In Proceedings of the 14th International Conference on Digital Audio Effects (DAFx-11). 135–139.
[6]
Sebastian Böck and Markus Schedl. 2012. Polyphonic piano note transcription with recurrent neural networks. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 121–124.
[7]
Sebastian Böck and Gerhard Widmer. 2013. Maximum filter vibrato suppression for onset detection. In Proc. of the 16th Int. Conf. on Digital Audio Effects (DAFx). Maynooth, Ireland (Sept 2013), Vol. 7. 4.
[8]
Paul M Brossier. 2006. Automatic annotation of musical audio for interactive applications. Ph. D. Dissertation. Centre for Digital Music, Queen Mary University of London, London, UK.
[9]
Paul M. Brossier. accessed July 23, 2024. Aubio, a library for audio labelling. (accessed July 23, 2024). http://aubio.piem.org.
[10]
Yu-Hua Chen, Wen-Yi Hsiao, Tsu-Kuang Hsieh, Jyh-Shing Roger Jang, and Yi-Hsuan Yang. 2022. Towards Automatic Transcription of Polyphonic Electric Guitar Music: A New Dataset and a Multi-Loss Transformer Model. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 786–790. https://doi.org/10.1109/ICASSP43922.2022.9747697
[11]
Alain de Cheveigné and Hideki Kawahara. 2002. YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America 111 (2002), 1917–1930.
[12]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
[13]
Simon Dixon. 2006. Onset detection revisited. In Proceedings of the 9th International Conference on Digital Audio Effects (DAFx-06), Vol. 120. 133–137.
[14]
Chris Duxbury, Juan Pablo Bello, Mike Davies, and Mark Sandler. 2003. Complex domain onset detection for musical signals. In Proceedings of the 6th International Conference on Digital Audio Effects (DAFx03), Vol. 1. 6–9.
[15]
Florian Eyben, Sebastian Böck, Björn Schuller, and Alex Graves. 2010. Universal onset detection with bidirectional long-short term memory neural networks. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), Utrecht, The Netherlands. 589–594.
[16]
Jonathan Foote and Shingo Uchihashi. 2001. The beat spectrum: a new approach to rhythm analysis. IEEE International Conference on Multimedia and Expo (ICME) (2001), 881–884.
[17]
R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage in, garbage out? do machine learning application papers in social computing report where human-labeled training data comes from?. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 325–336. https://doi.org/10.1145/3351095.3372862
[18]
Yuan Gong, Yu-An Chung, and James Glass. 2021. Ast: Audio spectrogram transformer. arXiv preprint arXiv:2104.01778 (2021).
[19]
Stephen Hainsworth and Malcolm D Macleod. 2003. Onset Detection in Musical Audio Signals. In Proceedings of the International Computer Music Conference (ICMC).
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21]
Tung-Sheng Huang, Ping-Chung Yu, and Li Su. 2023. Note and Playing Technique Transcription of Electric Guitar Solos in Real-World Music Performance. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095225
[22]
Christian Kehling, Jakob Abeßer, Christian Dittmar, and Gerald Schuller. 2014. Automatic Tablature Transcription of Electric Guitar Recordings by Estimation of Score-and Instrument-Related Parameters. In DAFx. 219–226.
[23]
Jong Wook Kim, Justin Salamon, Peter Li, and Juan Pablo Bello. 2018. Crepe: A Convolutional Representation for Pitch Estimation. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 161–165. https://doi.org/10.1109/ICASSP.2018.8461329
[24]
Khaled Koutini, Jan Schlüter, Hamid Eghbal-Zadeh, and Gerhard Widmer. 2021. Efficient training of audio transformers with patchout. arXiv preprint arXiv:2110.05069 (2021).
[25]
Andrea Martelloni, Andrew McPherson, and Mathieu Barthet. 2020. Percussive Fingerstyle Guitar through the Lens of NIME: an Interview Study. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME). 440–445. https://doi.org/10.5281/zenodo.4813463
[26]
Paul Masri. 1996. Computer modeling of Sound for Transformation and Synthesis of Musical Signal. Ph. D. Dissertation. University of Bristol, UK.
[27]
Jan Schlüter and Sebastian Böck. 2013. Musical onset detection with convolutional neural networks. In 6th international workshop on machine learning and music (MML), Prague, Czech Republic. sn.
[28]
Jan Schlüter and Sebastian Böck. 2014. Improved musical onset detection with Convolutional Neural Networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6979–6983. https://doi.org/10.1109/ICASSP.2014.6854953
[29]
Siddharth Sigtia, Emmanouil Benetos, and Simon Dixon. 2016. An End-to-End Neural Network for Polyphonic Piano Music Transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 5 (2016), 927–939. https://doi.org/10.1109/TASLP.2016.2533858
[30]
Domenico Stefani and Luca Turchet. 2021. Bio-Inspired Optimization of Parametric Onset Detectors. In Proceedings of the 24th International Conference on Digital Audio Effects (DAFx20in21) (Vienna, Austria), Vol. 2. 268–275. https://doi.org/10.23919/DAFx51585.2021.9768293
[31]
Domenico Stefani and Luca Turchet. 2022. On the Challenges of Embedded Real-Time Music Information Retrieval. In Proceedings of the 25-th Int. Conf. on Digital Audio Effects (DAFx20in22) (Vienna, Austria), Vol. 3. 177–184.
[32]
Dan Stowell and Mark Plumbley. 2007. Adaptive whitening for improved real-time audio onset detection. In Proceedings of the 2007 International Computer Music Conference, ICMC 2007. 312–319.
[33]
Li Su, Li-Fan Yu, and Yi-Hsuan Yang. 2014. Sparse Cepstral, Phase Codes for Guitar Playing Technique Classification. In ISMIR. 9–14.
[34]
L. Turchet. 2018. Hard real time onset detection for percussive sounds. In Proceedings of the Digital Audio Effects Conference. 349–356.
[35]
L. Turchet. 2019. Smart Musical Instruments: vision, design principles, and future directions. IEEE Access 7 (2019), 8944–8963.
[36]
Qingyang Xi, Rachel M Bittner, Johan Pauwels, Xuzhou Ye, and Juan Pablo Bello. 2018. GuitarSet: A Dataset for Guitar Transcription. In ISMIR. 453–460.

Index Terms

  1. On the Importance of Temporally Precise Onset Annotations for Real-Time Music Information Retrieval: Findings from the AG-PT-set Dataset

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        AM '24: Proceedings of the 19th International Audio Mostly Conference: Explorations in Sonic Cultures
        September 2024
        565 pages
        ISBN:9798400709685
        DOI:10.1145/3678299
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 18 September 2024

        Check for updates

        Author Tags

        1. Audio Processing
        2. Music Information Retrieval
        3. Real-time

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        AM '24

        Acceptance Rates

        Overall Acceptance Rate 177 of 275 submissions, 64%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 74
          Total Downloads
        • Downloads (Last 12 months)74
        • Downloads (Last 6 weeks)26
        Reflects downloads up to 23 Dec 2024

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media