Article

Understandable models Of music collections based on exhaustive feature generation with temporal statistics

Authors:

Fabian Moerchen,

Alfred UltschAuthors Info & Claims

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 882 - 891

https://doi.org/10.1145/1150402.1150523

Published: 20 August 2006 Publication History

Abstract

Data mining in large collections of polyphonic music has recently received increasing interest by companies along with the advent of commercial online distribution of music. Important applications include the categorization of songs into genres and the recommendation of songs according to musical similarity and the customer's musical preferences. Modeling genre or timbre of polyphonic music is at the core of these tasks and has been recognized as a difficult problem. Many audio features have been proposed, but they do not provide easily understandable descriptions of music. They do not explain why a genre was chosen or in which way one song is similar to another. We present an approach that combines large scale feature generation with meta learning techniques to obtain meaningful features for musical similarity. We perform exhaustive feature generation based on temporal statistics and train regression models to summarize a subset of these features into a single descriptor of a particular notion of music. Using several such models we produce a concise semantic description of each song. Genre classification models based on these semantic features are shown to be better understandable and almost as accurate as traditional methods.

References

[1]

C. C. Aggarwal, A. Hinneburg, and D. A. Keim. On the surprising behavior of distance metrics in high dimensional space. In Proc. Intl. Conf. on Database Theory, pages 420--434, 2001.]]

Digital Library

[2]

J.-J. Aucouturier and F. Pachet. Finding songs that sound the same. In Proc. IEEE Benelux Workshop on Model based Processing and Coding of Audio, pages 1--8, 2002.]]

[3]

J.-J. Aucouturier and F. Pachet. Improving timbre similarity: How high is the sky? Journal of Negative Results in Speech and Audio Sciences, 1(1):1--13, 2004.]]

[4]

J.-J. Aucouturier and F. Pachet. Tools and architecture for the evaluation of similarity measures: case study of timbre similarity. In Proc. ISMIR, 2004.]]

[5]

A. Berenzweig, D. Ellis, and S. Lawrence. Anchor space for classification and similarity measurement of music. In Proc. ICME, pages I-29--32, 2003.]]

Digital Library

[6]

M. W. Berry. Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, 2003.]]

Digital Library

[7]

T. G. Dietterich. Ensemble methods in machine learning. In J. Kittler and F. Roli, editors, First International Workshop on Multiple Classifier Systems, pages 1--15, 2000.]]

Digital Library

[8]

S. Fischer, R. Klinkenberg, I. Mierswa, and O. Ritthoff. Yale: Yet Another Learning Environment - Tutorial. Technical Report CI-136/02, Collaborative Research Center 531, University of Dortmund, Germany, 2002.]]

[9]

A. Genkin, D. D. Lewis, and D. Madigan. Large-scale bayesian logistic regression for text categorization. Technical report, DIMACS, 2004.]]

[10]

M. Goto. A chorus-section detecting method for musical audio signals. In Proc. IEEE ICASSP, pages 437--440, 2003.]]

[11]

G. Guo and S. Z. Li. Content-Based Audio Classification and Retrieval by Support Vector Machines. IEEE Transaction on Neural Networks, 14(1):209--215, 2003.]]

Digital Library

[12]

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.]]

[13]

P. Herrera, J. Bello, G. Widmer, M. Sandler, O. Celma, F. Vignoli, E. Pampalk, P. Cano, S. Pauws, and X. Serra. Simac: Semantic interaction with music audio contents. In Proc. of the 2nd European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies, 2005.]]

[14]

H. Homburg, I. Mierswa, B. Moeller, K. Morik, and M. Wurst. A benchmark dataset for audio classification and clustering. In Proc. ISMIR, pages 528--531, 2005.]]

[15]

H. Kantz and T. Schreiber. Nonlinear Time Series Analysis. Cambridge University Press, 1997.]]

Digital Library

[16]

S. le Cessie and J. van Houwelingen. Ridge estimators in logistic regression. Applied Statistics, 41(1):191--201, 1992.]]

[17]

D. Li, I. Sethi, N. Dimitrova, and T. McGee. Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22:533--544, 2001.]]

Digital Library

[18]

T. Li, M. Ogihara, and Q. Li. A comparative study on content-based music genre classification. In Proc. ACM SIGIR, pages 282--289, 2003.]]

Digital Library

[19]

B. Logan and A. Salomon. A music similarity function based on signal analysis. In IEEE Intl. Conf. on Multimedia and Expo, page 190, 2001.]]

[20]

M. McKinney and J. Breebaart. Features for audio and music classification. In Proc. ISMIR, pages 151--158, 2003.]]

[21]

A. Meng, P. Ahrendt, and J. Larsen. Improving music genre classification by short-time feature integration. In Proc. IEEE ICASSP, pages 497--500, 2005.]]

[22]

I. Mierswa and K. Morik. Automatic feature extraction for classifying audio data. Machine Learning Journal, 58:127--149, 2005.]]

Digital Library

[23]

B. Moore and B. Glasberg. A revision of zwickers loudness model. ACTA Acustica, 82:335--345, 1996.]]

[24]

F. Mörchen, A. Ultsch, M. Nöcker, and C. Stamm. Databionic visualization of music collections according to perceptual distance. In Proc. ISMIR, pages 396--403, 2005.]]

[25]

F. Mörchen, A. Ultsch, M. Thies, and I. Löhken. Modelling timbre distance with temporal statistics from polyphonic music. IEEE TSAP, 14(1), 2006.]]

Digital Library

[26]

F. Mörchen, A. Ultsch, M. Thies, I. Löhken, M. Nöcker, C. Stamm, N. Efthymiou, and M. Kümmerer. MusicMiner: Visualizing timbre distances of music as topograpical maps. Technical report, CS Dept., Philipps-University Marburg, Germany, 2005.]]

[27]

F. Pachet and A. Zils. Evolving automatically high-level music descriptors from acoustic signals. In Proc. Intl. Symposium on Computer Music Modeling and Retrieval, 2003.]]

[28]

E. Pampalk. A Matlab toolbox to compute music similarity from audio. In Proc. ISMIR, 2004.]]

[29]

E. Pampalk, S. Dixon, and G. Widmer. On the evaluation of perceptual similarity measures for music. In Proc. Intl. Conf. on Digital Audio Effects, pages 6--12, 2003.]]

[30]

E. Pampalk, A. Rauber, and D. Merkl. Content-based organization and visualization of music archives. In Proc. ACM Multimedia, pages 570--579, 2002.]]

Digital Library

[31]

J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, chapter 12. MIT-Press, 1999.]]

Digital Library

[32]

J. R. Quinlan. C4.5: Programs for Machine Learning. Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.]]

Digital Library

[33]

L. Rabiner and B.-H. Juang. Fundamentals of Speech Recognition. Prentice-Hall, 1993.]]

Digital Library

[34]

C. Snoek and M. Worring. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 25(1):5--35, 2005.]]

Digital Library

[35]

R. Stenzel and T. Kamps. Improving content-based similarity measures by training a collaborative model. In Proc. ISMIR 2005, pages 264--271, 2005.]]

[36]

F. Takens. Dynamical systems and turbulencs. In D. Rand and L. Young, editors, Lecture Notes in Mathematics, volume 898, pages 366--381. Springer, 1981.]]

[37]

R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal Statistical Soc. B., 58:267--288, 1996.]]

[38]

G. Tzanetakis and P. Cook. Marsyas: A framework for audio analysis. Organised Sound, 4(30):169--175, 2000.]]

Digital Library

[39]

G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE TSAP, 10(5):293--302, 2002.]]

[40]

G. Tzanetakis, G. Essl, and P. Cook. Automatic musical genre classification of audio signals. In Proc. ISMIR, pages 205--210, 2001.]]

[41]

G. Tzanetakis, G. Essl, and P. Cook. Human perception and computer extraction of beat strength. In Proc. Intl. Conf. on Digital Audio Effects, 2002.]]

[42]

A. Ultsch. Self-organizing neural networks for visualization and classification. In Proc. Conf. German Classification Society, 1992.]]

[43]

K. West and S. Cox. Features and classifiers for the automatic classification of musical audio signals. In Proc. ISMIR, 2004.]]

[44]

D. H. Wolpert. Stacked generalization. Neural Networks, 5:241--259, 1992.]]

Digital Library

[45]

C. Xu, N. Maddage, and X. Shao. Musical genre classification using support vector machines. In Proc. IEEE ICASSP, pages 429--432, 2003.]]

[46]

T. Zhang and C. Kuo. Content-based Classification and Retrieval of Audio. In Conf. on Advanced Signal Processing Algorithms, Architectures, and Implementations VIII, 1998.]]

[47]

E. Zwicker and S. Stevens. Critical bandwidths in loudness summation. The Journal of the Acoustical Society of America, 29(5):548--557, 1957.]]

Cited By

Müllensiefen DFrieler K(2022)Statistical Methods in Music Corpus StudiesThe Oxford Handbook of Music and Corpus Studies10.1093/oxfordhb/9780190945442.013.8Online publication date: 18-Aug-2022
https://doi.org/10.1093/oxfordhb/9780190945442.013.8
Singh YBiswas A(2022)Robustness of musical features on deep learning models for music genre classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.116879199:COnline publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1016/j.eswa.2022.116879
Medhat FChesmore DRobinson J(2020)Masked Conditional Neural Networks for sound classificationApplied Soft Computing10.1016/j.asoc.2020.106073(106073)Online publication date: Jan-2020
https://doi.org/10.1016/j.asoc.2020.106073
Show More Cited By

Index Terms

Understandable models Of music collections based on exhaustive feature generation with temporal statistics
1. Information systems
  1. Information systems applications

Recommendations

Modeling timbre distance with temporal statistics from polyphonic music

Timbre distance and similarity are expressions of the phenomenon that some music appears similar while other songs sound very different to us. The notion of genre is often used to categorize music, but songs from a single genre do not necessarily sound ...
Music Genre Classification and Feature Comparison using ML
ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies

An essential feature of the music is the genre, which can be considered a high-level description of an individual piece of music. In this sense, genre as a music feature is similar to typical descriptive features from the ML perspective. Although a ...
Aggregate features and ADABOOST for music classification

We present an algorithm that predicts musical genre and artist from an audio waveform. Our method uses the ensemble learner A DA B OOST to select from a set of audio features that have been extracted from segmented audio and then aggregated. Our ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2006

986 pages

ISBN:1595933395

DOI:10.1145/1150402

Conference Chair:
Tina Eliassi-Rad
LLNL
,
General Chair:
Lyle Ungar
University of Pennsylvania
,
Program Chairs:
Mark Craven
University of Wisconsin
,
Dimitrios Gunopulos
University of California, Riverside

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD06

Sponsor:

KDD06: The 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 20 - 23, 2006

PA, Philadelphia, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
783
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Müllensiefen DFrieler K(2022)Statistical Methods in Music Corpus StudiesThe Oxford Handbook of Music and Corpus Studies10.1093/oxfordhb/9780190945442.013.8Online publication date: 18-Aug-2022
https://doi.org/10.1093/oxfordhb/9780190945442.013.8
Singh YBiswas A(2022)Robustness of musical features on deep learning models for music genre classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.116879199:COnline publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1016/j.eswa.2022.116879
Medhat FChesmore DRobinson J(2020)Masked Conditional Neural Networks for sound classificationApplied Soft Computing10.1016/j.asoc.2020.106073(106073)Online publication date: Jan-2020
https://doi.org/10.1016/j.asoc.2020.106073
Datta ASolanki SSengupta RChakraborty SMahto KPatranabis ADatta ASolanki SSengupta RChakraborty SMahto KPatranabis A(2017)Music Information RetrievalSignal Analysis of Hindustani Classical Music10.1007/978-981-10-3959-1_2(17-33)Online publication date: 11-Mar-2017
https://doi.org/10.1007/978-981-10-3959-1_2
Medhat FChesmore DRobinson J(2017)Music Genre Classification Using Masked Conditional Neural NetworksNeural Information Processing10.1007/978-3-319-70096-0_49(470-481)Online publication date: 26-Oct-2017
https://doi.org/10.1007/978-3-319-70096-0_49
Wei YJiao LWang SChen YLiu D(2015)Time Series Classification with Max-Correlation and Min-Redundancy Shapelets TransformationProceedings of the 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things (IIKI)10.1109/IIKI.2015.9(7-12)Online publication date: 22-Oct-2015
https://dl.acm.org/doi/10.1109/IIKI.2015.9
Sturm B(2014)The State of the Art Ten Years After a State of the Art: Future Research in Music Information RetrievalJournal of New Music Research10.1080/09298215.2014.89453343:2(147-172)Online publication date: 9-May-2014
https://doi.org/10.1080/09298215.2014.894533
Sturm B(2014)A Survey of Evaluation in Music Genre RecognitionAdaptive Multimedia Retrieval: Semantics, Context, and Adaptation10.1007/978-3-319-12093-5_2(29-66)Online publication date: 29-Oct-2014
https://doi.org/10.1007/978-3-319-12093-5_2
Sturm B(2013)Classification accuracy is not enoughJournal of Intelligent Information Systems10.1007/s10844-013-0250-y41:3(371-406)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1007/s10844-013-0250-y
Lines JDavis LHills JBagnall AYang QAgarwal DPei J(2012)A shapelet transform for time series classificationProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2339530.2339579(289-297)Online publication date: 12-Aug-2012
https://dl.acm.org/doi/10.1145/2339530.2339579
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents