research-article

Hierarchical Representation Based on Bayesian Nonparametric Tree-Structured Mixture Model for Playing Technique Classification

Authors:

Jia-Ching WangAuthors Info & Claims

Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017

Pages 537 - 543

https://doi.org/10.1145/3126686.3126757

Published: 23 October 2017 Publication History

Abstract

This work develops a topic model-based hierarchical representation for identifying the latent characteristics behind the frame-level musical features. Frame-level features and music clips are regarded as acoustic words and acoustic documents, respectively. A Gaussian hierarchical latent Dirichlet allocation (G-hLDA) is proposed to find the latent topics behind the acoustic document. The G-hLDA directly handles the continuous features instead of transforming them into discrete words, reducing information loss from discretization-based vector quantization. Specially, each latent topic that is identified by G-hLDA is represented as a node in the infinitely deep, infinitely branching tree. For a music clip, the number of its acoustic words at each node is computed to form the hierarchical representation. The proposed representation hierarchically captures not only the shared components but also the unique components among music clips, resulting in improved performance. The experimental results on the guitar playing technique database demonstrate that the proposed method outperforms baselines.

References

[1]

J. Abeßer, H. Lukashevich, and G. Schuller. 2010. Feature-based extraction of plucking and expression styles of the electric bass guitar Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2290--2293.

[2]

F. Auger and P. Flandrin. 1995. Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Transactions on Signal Processing Vol. 43, 5 (May. 1995), 1068--1089.

Digital Library

[3]

D. M. Blei, T. L. Griffiths, and M. I. Jordan. 2010. The nested chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM Vol. 57, 2, Article 7 (Feb. 2010), 30 pages.

Digital Library

[4]

D. M. Blei, A. Y. Ng, and M. I Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research Vol. 3, 4-5 (Jan. 2003), 993--1022.

[5]

C. C. Chang and C. J. Lin. 2011. LIBSVM: A Library for support vector machines. ACM Transactions on Intelligent Systems and Technology, Vol. 2, 3, Article 27 (May. 2011), 27 pages.

Digital Library

[6]

Y. P. Chen, L. Su, and Y. H. Yang. 2015. Electric guitar playing technique detection in real-world recordings based on F0 sequence pattern recognition. In Proceedings of the International Society for Music Information Retrieval (ISMIR). 708--714.

[7]

K. R. Fitz and S. A. Fulop. 2009. A unified theory of time-frequency reassignment. CoRR Vol. abs/0903.3080 (2009).

[8]

A. Gersho and Robert M. Gray. 1991. Vector Quantization and Signal Compression. Kluwer Academic Publishers.

Digital Library

[9]

P. Hu, W. Liu, W. Jiang, and Z. Yang. 2012. Latent Topic Model Based on Gaussian-LDA for Audio Retrieval. Springer Berlin Heidelberg, 556--563.

[10]

E. J. Humphrey, J. P. Bello, and Y. LeCun. 2012. Deep architectures and automatic feature learning in music informatics Proceedings of the International Society for Music Information Retrieval (ISMIR). 403--408.

[11]

S. Kim, P. Georgiou, and S. Narayanan. 2012. Supervised acoustic topic model with a consequent classifier for unstructured audio classification. In Proceedings of the International Workshop on Content-Based Multimedia Indexing (CBMI). 1--6.

[12]

S. Kim, S. Narayanan, and S. Sundaram. 2009. Acoustic topic model for audio information retrieval Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (ASPAA). 37--40.

[13]

S. Kim, S. Sundaram, P. Georgiou, and S. Narayanan. 2009. Audio scene understanding using topic models. In Proceedings of the Neural Information Processing Systems (NIPS) Workshop. 1--4.

[14]

O. Lartillot and P. Toiviainen. 2007. A Matlab toolbox for musical feature extraction from audio Proceedings of the International Conference on Digital Audio Effects. 237--244.

[15]

B. Logan. 2000. Mel frequency cepstral coefficients for music modeling Proceedings of the International Society of Music Information Retrieval (ISMIR).

[16]

J. Mairal, F. Bach, J. Ponce, and G. Sapiro. 2009. Online dictionary learning for sparse coding. In Proceedings of the Annual International Conference on Machine Learning (ICML). 689--696.

Digital Library

[17]

P. Manzagol, T. Bertin-Mahieux, and D. Eck. 2008. On the use of sparse time-relative auditory codes for music Proceedings of the International Society of Music Information Retrieval (ISMIR). 14--18.

[18]

M. Muller, D. P. W. Ellis, A. Klapuri, and G. Richard. 2011. Signal Processing for Music Analysis. IEEE Journal of Selected Topics in Signal Processing, Vol. 5, 6 (Oct. 2011), 1088--1110.

[19]

T. Nakano, K. Yoshii, and M. Goto. 2014. Vocal timbre analysis using latent Dirichlet allocation and cross-gender vocal timbre similarity. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5202--5206.

[20]

J. Nam, J. Herrera, M. Slaney, and J. Smith. 2012. Learning Sparse Feature Representations for Music Annotation and Retrieval Proceedings of the International Society for Music Information Retrieval (ISMIR).

[21]

J. Nam, J. Herrera, M. Slaney, and J. Smith. 2012. Learning sparse feature representations for music annotation and retrieval Proceedings of the International Society for Music Information Retrieval (ISMIR). 565--560.

[22]

K. O'Hanlon and M. D. Plumbley. 2013. Automatic Music Transcription using row weighted decompositions Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 16--20.

[23]

I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. 2008. Fast collapsed gibbs sampling for latent Dirichlet allocation Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 569--577.

Digital Library

[24]

L. Su, H. M. Lin, and Y. H. Yang. 2014. Sparse modeling of magnitude and phase-derived spectra for playing technique classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 22, 12 (Dec. 2014), 2122--2132.

Digital Library

[25]

L. Su, C. C. M. Yeh, J. Y. Liu, J. C. Wang, and Y. H. Yang. 2014. A systematic evaluation of the bag-of-frames representation for music information retrieval. IEEE Transactions on Multimedia Vol. 16, 5 (Mar. 2014), 1188--1200.

[26]

L. Su, L. F. Yu, and Y. H. Yang. 2014. Sparse cepstral and phase codes for guitar playing technique classification Proceedings of the International Society for Music Information Retrieval (ISMIR). 9--14.

[27]

A. Tindale, A. Kapur, G. Tzanetakis, and I. Fujinaga. 2004. Retrieval of percussion gestures using timbre classification techniques Proceedings of the International Society for Music Information Retrieval (ISMIR). 541--545.

[28]

K. Yazawa, K. Itoyama, and H. G. Okuno. 2014. Automatic transcription of guitar tablature from audio signals in accordance with player's proficiency. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 3122--3126.

Cited By

Paraskevoudis KGiannakopoulos T(2021)Instrument Playing Technique Recognition: A Greek Music Use CaseProceedings of the Worldwide Music Conference 202110.1007/978-3-030-74039-9_13(124-136)Online publication date: 13-Apr-2021
https://doi.org/10.1007/978-3-030-74039-9_13
Wu SLee YChen SWang J(2018)Learning a Hierarchical Latent Semantic Model for Multimedia Data2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8545305(2995-3000)Online publication date: Aug-2018
https://doi.org/10.1109/ICPR.2018.8545305

Index Terms

Hierarchical Representation Based on Bayesian Nonparametric Tree-Structured Mixture Model for Playing Technique Classification
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Learning latent representations

Recommendations

Sparse modeling of magnitude and phase-derived spectra for playing technique classification

Computational modeling of musical timbre is important for a variety of music information retrieval applications. While considerable progress has been made to recognize musical genres and instruments, relatively little attention has been paid to modeling ...
Bayesian Melody Harmonization Based on a Tree-Structured Generative Model of Chord Sequences and Melodies
This article describes a melody harmonization method that generates a sequence of chords (symbols and onset positions) for a given melody (a sequence of musical notes). A typical approach to melody harmonization is to use a hidden Markov model (HMM) that ...
Automatic Classification of Guitar Playing Modes
Sound, Music, and Motion
Abstract
When they improvise, musicians typically alternate between several playing modes on their instruments. Guitarists in particular, alternate between modes such as octave playing, mixed chords and bass, chord comping, solo melodies, walking bass, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017

October 2017

558 pages

ISBN:9781450354165

DOI:10.1145/3126686

Program Chairs:
Wanmin Wu
Google, USA
,
Jianchao Yang
Snap Inc., USA
,
Qi Tian
The University of Texas at San Antonio, USA
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '17

Sponsor:

SIGMM

MM '17: ACM Multimedia Conference

October 23 - 27, 2017

California, Mountain View, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
101
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Paraskevoudis KGiannakopoulos T(2021)Instrument Playing Technique Recognition: A Greek Music Use CaseProceedings of the Worldwide Music Conference 202110.1007/978-3-030-74039-9_13(124-136)Online publication date: 13-Apr-2021
https://doi.org/10.1007/978-3-030-74039-9_13
Wu SLee YChen SWang J(2018)Learning a Hierarchical Latent Semantic Model for Multimedia Data2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8545305(2995-3000)Online publication date: Aug-2018
https://doi.org/10.1109/ICPR.2018.8545305

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents