research-article

The “Horse” Inside: Seeking Causes Behind the Behaviors of Music Content Analysis Systems

Author:

Bob L. SturmAuthors Info & Claims

Computers in Entertainment (CIE), Volume 14, Issue 2

Article No.: 3, Pages 1 - 32

https://doi.org/10.1145/2967507

Published: 10 January 2017 Publication History

Abstract

Building systems that possess the sensitivity and intelligence to identify and describe high-level attributes in music audio signals continues to be an elusive goal but one that surely has broad and deep implications for a wide variety of applications. Hundreds of articles have so far been published toward this goal, and great progress appears to have been made. Some systems produce remarkable accuracies at recognizing high-level semantic concepts, such as music style, genre, and mood. However, it might be that these numbers do not mean what they seem. In this article, we take a state-of-the-art music content analysis system and investigate what causes it to achieve exceptionally high performance in a benchmark music audio dataset. We dissect the system to understand its operation, determine its sensitivities and limitations, and predict the kinds of knowledge it could and could not possess about music. We perform a series of experiments to illuminate what the system has actually learned to do and to what extent it is performing the intended music listening task. Our results demonstrate how the initial manifestation of music intelligence in this state of the art can be deceptive. Our work provides constructive directions toward developing music content analysis systems that can address the music information and creation needs of real-world users.

References

[1]

S. Argamon, K. Burns, and S. Dubnov (Eds.). 2010. The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning. Springer.

Digital Library

[2]

J.-.J. Aucouturier. 2009. Sounds like teen spirit: Computational insights into the grounding of everyday musical terms. In Language, Evolution and the Brain: Frontiers in Linguistic Series, J. Minett and W. Wang (Eds.). Academia Sinica Press.

[3]

J. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. Kégl. 2006. Aggregate features and adaboost for music classification. Mach. Learn. 65, 2--3 (Jun. 2006), 473--484.

Digital Library

[4]

M. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney. 2008. Content-based music information retrieval: Current directions and future challenges. Proc. IEEE 96, 4 (Apr. 2008), 668--696.

[5]

Nick Collins. 2010. Introduction to Computer Music. Wiley.

[6]

D. Cope. 1991. Computers and Musical Style. Oxford University Press.

Digital Library

[7]

P. P. Cruz and E. Vidal-Ruiz. 2003. Modeling musical style using grammatical inference techniques: A tool for classifying and generating melodies. In Proc. Wedelmusic. 77--84.

[8]

Pedro P. Cruz and Enrique Vidal. 2008. Two grammatical inference applications in music processing. Appl. Artif. Intell. 22, 1/2 (2008), 53--76.

Digital Library

[9]

S. J. Cunningham, D. Bainbridge, and J. S. Downie. 2012. The impact of MIREX on scholarly research. In Proc. ISMIR. 259--264.

[10]

R. B. Dannenberg, B. Thom, and D. Watson. 1997. A machine learning approach to musical style recognition. In Proc. ICMC. 344--347.

[11]

L. Deng and D. Yu. 2014. Deep Learning: Methods and Applications. Now Publishers.

Digital Library

[12]

S. Dixon, F. Gouyon, and G. Widmer. 2004. Towards characterisation of music via rhythmic patterns. In Proc. ISMIR. 509--517.

[13]

S. Dixon, E. Pampalk, and G. Widmer. 2003. Classification of dance music by periodicity patterns. In Proc. ISMIR.

[14]

Alexey Dosovitskiy, Jost Tobias Springenberg, and Thomas Brox. 2014. Learning to generate chairs with convolutional neural networks. CoRR abs/1411.5928 (2014).

[15]

J. Downie, Andreas Ehmann, Mert Bay, and M. Jones. 2010. The music information retrieval evaluation exchange: Some observations and insights. In Advances in Music Information Retrieval, Zbigniew Ras and Alicja Wieczorkowska (Eds.). Springer, Berlin, 93--115.

[16]

J. S. Downie. 2004. The scientific evaluation of music information retrieval systems: Foundations and future. Comput. Music J. 28, 2 (2004), 12--23.

Digital Library

[17]

J. S. Downie. 2008. The music information retrieval evaluation exchange (2005--2007): A window into music information retrieval research. Acoust. Sci. Tech. 29, 4 (2008), 247--255.

[18]

S. Dubnov, G. Assayag, O. Lartillot, and G. Bejerano. 2003. Using machine-learning methods for musical style modeling. Computer 36, 10 (2003), 73--80.

Digital Library

[19]

S. Dubnov and G. Surges. 2014. Digital Da Vinci. Springer, Chapter Delegating Creativity: Use of Musical Algorithms in Machine Listening and Composition, 127--158.

[20]

A. Eigenfeldt. 2012. Embracing the bias of the machine: Exploring non-human fitness functions. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.

[21]

A. Eigenfeldt. 2013. The human fingerprint in machine generated music. In Proc. xCoAx.

[22]

A. Eigenfeldt and P. Pasquier. 2013a. Considering vertical and horizontal context in corpus-based generative electronic dance music. In Proc. Int. Conf. Computational Creativity.

[23]

A. Eigenfeldt and P. Pasquier. 2013b. Evolving structures for electronic dance music. In Proc. Conf. Genetic and Evolutionary Computation. 319--326.

Digital Library

[24]

A. Eigenfeldt, M. Thorogood, J. Bizzocchi, P. Pasquier, and T. Calvert. 2014. Video, music and sound metacreation. In Proc. xCoAx. Porto, Portugal.

[25]

T. M. Esparza, J. P. Bello, and E. J. Humphrey. 2014. From genre classification to rhythm similarity: Computational and musicological insights. J. New Music Res. (2014).

[26]

F. Fabbri. 1999. Browsing musical spaces: Categories and the musical mind. In Proc. Intl. Association for the Study of Popular Music.

[27]

A. Flexer, F. Gouyon, S. Dixon, and G. Widmer. 2006. Probabilistic combination of features for music classification. In Proc. ISMIR. Victoria, BC, Canda, 111--114.

[28]

Z. Fu, G. Lu, K. M. Ting, and D. Zhang. 2011. A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13, 2 (Apr. 2011), 303--319.

Digital Library

[29]

F. Ghedini, F. Pachet, and P. Roy. 2015. Multidisciplinary Contributions to the Science of Creative Thinking. Springer, Chapter Creating Music and Texts with Flow Machines.

[30]

L. Goehr. 1994. The Imaginary Museum of Musical Works: An Essay in the Philosophy of Music. Oxford University Press.

[31]

F. Gouyon. 2005. A Computational Approach to Rhythm Description—Audio Features for the Computation of Rhythm Periodicity Functions and Their use in Tempo Induction and Music Content Processing. Ph.D. Dissertation. Universitat Pompeu Fabra.

[32]

F. Gouyon and S. Dixon. 2004. Dance music classification: A tempo-based approach. In Proc. ISMIR. 501--504.

[33]

F. Gouyon, S. Dixon, E. Pampalk, and G. Widmer. 2004. Evaluating rhythmic descriptors for musical genre classification. In Proc. Int. Audio Eng. Soc. Conf. 196--204.

[34]

F. Gouyon, B. L. Sturm, J. L. Oliveira, N. Hespanhol, and T. Langlois. 2014. On Evaluation Validity in Music Autotagging. eprint arXiv:1410.0001. https://arxiv.org/abs/1410.0001.

[35]

P. Hamel and D. Eck. 2010. Learning features from music audio with deep belief networks. In Proc. ISMIR. 339--344.

[36]

T. Hastie, R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2 ed.). Springer-Verlag.

[37]

L. Hiller and L. Isaacson. 1959. Experimental Music: Composition with an Electronic Computer. Greenwood Press.

[38]

A. Holzapfel and Y. Stylianou. 2008. Rhythmic similarity of music based on dynamic periodicity warping. In Proc. ICASSP. 2217--2220.

[39]

A. Holzapfel and Y. Stylianou. 2009. A scale based method for rhythmic similarity of music. In Proc. ICASSP. Taipei, Taiwan, 317--320.

Digital Library

[40]

E. J. Humphrey, J. P. Bello, and Y. LeCun. 2013. Feature learning and deep architectures: New directions for music informatics. J. Intell. Info. Syst. 41, 3 (2013), 461--481.

Digital Library

[41]

F. Krebs, S. Böch, and G. Widmer. 2013. Rhythmic pattern modeling for beat and downbeat tracking in musical audio. In Proc. ISMIR.

[42]

T. Lidy. 2006. Evaluation of New Audio Features and Their Utilization in Novel Music Retrieval Applications. Master's thesis. Vienna University of Tech., Vienna, Austria.

[43]

T. Lidy, R. Mayer, A. Rauber, P. P. de Leon, A. Pertusa, and J. Quereda. 2010. A cartesian ensemble of feature subspace classifiers for music categorization. In Proc. ISMIR. 279--284.

[44]

T. Lidy and A. Rauber. 2005. Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In Proc. ISMIR. 34--41.

[45]

Thomas Lidy and Andreas Rauber. 2008. Classification and clustering of music for novel music access applications. In Machine Learning Techniques for Multimedia, Matthieu Cord and Pádraig Cunningham (Eds.). Springer, Berlin, 249--285.

[46]

T. Lidy, A. Rauber, A. Pertusa, and J. M. I nesta. 2007. Improving genre classification by combination of audio and symbolic descriptors using a transcription system. In Proc. ISMIR. Vienna, Austria, 61--66.

[47]

R. Mayer, A. Rauber, P. J. Ponce de León, C. Pérez-Sancho, and J. M. Iñesta. 2010. Feature selection in a cartesian ensemble of feature subspace classifiers for music categorisation. In Proc. ACM Int. Workshop Machine Learning and Music. 53--56.

Digital Library

[48]

A. Nguyen, J. Yosinski, and J. Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proc. CVPR. 427--436.

[49]

Francois Pachet. 2003. The continuator: Musical interaction with style. J, New Music Res, 32, 3 (2003), 333--341.

[50]

F. Pachet. 2011. Musical metadata and knowledge management. In Encyclopedia of Knowledge Management, David G. Schwartz and Dov Te'eni (Eds.). IGI Global, 1192--1199.

[51]

G. Peeters. 2005. Rhythm classification using spectral rhythm patterns. In Proc. ISMIR.

[52]

G. Peeters. 2011. Spectral and temporal periodicity representations of rhythm for the automatic classification of music audio signal. IEEE Trans. Audio, Speech Lang. Process. 19, 5 (July 2011), 1242--1252.

Digital Library

[53]

O. Pfungst. 1911. Clever Hans (The Horse of Mr. Von Osten): A Contribution to Experimental Animal and Human Psychology. Henry Holt, New York.

[54]

A. Pikrakis. 2013. A deep learning approach to rhythm modeling with applications. In Proc. Int. Workshop Machine Learning and Music.

[55]

T. Pohle, D. Schnitzer, M. Schedl, P. Knees, and G. Widmer. 2009. On rhythm and general music similarity. In Proc. ISMIR.

[56]

C. Roads. 1996. Computer Music Tutorial. The MIT Press.

Digital Library

[57]

N. Scaringella, G. Zoia, and D. Mlynek. 2006. Automatic genre classification of music content: A survey. IEEE Signal Process. Mag. 23, 2 (Mar. 2006), 133--141.

[58]

A. Schindler and A. Rauber. 2012. Capturing the temporal domain in echonest features for improved classification effectiveness. In Proc. Adaptive Multimedia Retrieval.

[59]

J. Schlüter and C. Osendorfer. 2011. Music similarity estimation with the mean-covariance restricted boltzmann machine. In Proc. ICMLA.

Digital Library

[60]

Dominik Schnitzer, Arthur Flexer, Markus Schedl, and Gerhard Widmer. 2011. Using mutual proximity to improve content-based audio similarity. In ISMIR. 79--84.

[61]

D. Schnitzer, A. Flexer, M. Schedl, and G. Widmer. 2012. Local and global scaling reduce hubs in space. J. Mach. Learn. Res. 13 (2012), 2871--2902.

Digital Library

[62]

K. Seyerlehner. 2010. Content-based Music Recommender Systems: Beyond Simple Frame-level Audio Similarity. Ph.D. Dissertation. Johannes Kepler University, Linz, Austria.

[63]

K. Seyerlehner, M. Schedl, R. Sonnleitner, D. Hauger, and B. Ionescu. 2012. From improved auto-taggers to improved music similarity measures. In Proc. Adaptive Multimedia Retrieval. Copenhagen, Denmark.

[64]

K. Seyerlehner, G. Widmer, and T. Pohle. 2010. Fusing block-level features for music similarity estimation. In Proc. DAFx. 1--8.

[65]

C. E. Shannon and W. Weaver. 1998. The Mathematical Theory of Communication. University of Illinois Press.

[66]

C. N. Silla, A. L. Koerich, and C. A. A. Kaestner. 2008. The latin music database. In Proc. ISMIR. 451--456.

[67]

M. Slaney. 1998. Auditory Toolbox. Technical Report. Interval Research Corporation.

[68]

Y. Song, S. Dixon, and M. Pearce. 2012. Evaluation of musical features for emotion classification. In Proc. ISMIR.

[69]

B. L. Sturm. 2012. Two systems for automatic music genre recognition: What are they really recognizing? In Proc. ACM MIRUM Workshop. 69--74.

Digital Library

[70]

B. L. Sturm. 2013. Classification accuracy is not enough: On the evaluation of music genre recognition systems. J. Intell. Info. Syst. 41, 3 (2013), 371--406.

Digital Library

[71]

B. L. Sturm. 2014a. A simple method to determine if a music information retrieval system is a “horse.” IEEE Trans. Multimedia 16, 6 (2014), 1636--1644.

[72]

B. L. Sturm. 2014b. The state of the art ten years after a state of the art: Future research in music information retrieval. J. New Music Res. 43, 2 (2014), 147--172.

[73]

B. L. Sturm. 2014c. A survey of evaluation in music genre recognition. In Adaptive Multimedia Retrieval: Semantics, Context, and Adaptation, A. Nürnberger, S. Stober, B. Larsen, and M. Detyniecki (Eds.), Vol. LNCS 8382. 29--66.

[74]

B. L. Sturm, C. Kereliuk, and A. Pikrakis. 2014. A closer look at deep learning neural networks with low-level spectral periodicity features. In Proc. Int. Workshop on Cognitive Info. Process. 1--6.

[75]

Li Su, C.-C. M. Yeh, Jen-Yu Liu, Ju-Chiang Wang, and Yi-Hsuan Yang. 2014. A systematic evaluation of the bag-of-frames representation for music information retrieval. IEEE Trans. Multimedia 16, 5 (Aug 2014), 1188--1200.

[76]

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. 2014. Intriguing properties of neural networks. In Proc. ICLR.

[77]

N. G. Thomas, P. Pasquier, A. Eigenfeldt, and J. B. Maxwell. 2013. A methodology for the comparuison of melodic generation models using meta-melo. In Proc. ISMIR.

[78]

E. Tsunoo, G. Tzanetakis, N. Ono, and S. Sagayama. 2009. Audio genre classification using percussive pattern clustering combined with timbral features. In Proc. ICME.

Digital Library

[79]

E. Tsunoo, G. Tzanetakis, N. Ono, and S. Sagayama. 2011. Beyond timbral statistics: Improving music classification using percussive patterns and bass lines. IEEE Trans. Aud. Speech Lang. Process. 19, 4 (May 2011), 1003--1014.

Digital Library

[80]

D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. 2008. Semantic annotation and retrieval of music and sound effects. IEEE Trans. Aud. Speech Lang. Process. 16 (2008).

Digital Library

[81]

G. Tzanetakis and P. Cook. 2002. Musical genre classification of audio signals. IEEE Trans. Speech Aud. Process. 10, 5 (July 2002), 293--302.

[82]

J. Urbano, M. Schedl, and X. Serra. 2013. Evaluation in music information retrieval. J. Intell. Info. Syst. 41, 3 (Dec. 2013), 345--369.

Digital Library

[83]

A. Wang. 2003. An industrial strength audio search algorithm. In Proc. Int. Soc. Music Info. Retrieval.

[84]

G. A. Wiggins. 2009. Semantic gap?? schemantic schmap!! methodological considerations in the scientific study of music. In Proc. IEEE Int. Symp. Mulitmedia. 477--482.

Digital Library

[85]

World Sport Dance Federation. 2014. World Sport Dance Federation Competition Rules. World Sport Dance Federation, Bucharest, Romania. Retrieved from https://www.worlddancesport.org/Rule/Competition/General.

Cited By

Chettri B(2023)The Clever Hans Effect in Voice Spoofing Detection2022 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT54892.2023.10022624(577-584)Online publication date: 9-Jan-2023
https://doi.org/10.1109/SLT54892.2023.10022624
Burgoyne JKoops H(2021)We are Not Groupies⋯ We are Band Aids’: Assessment Reliability in the AI Song ContestTransactions of the International Society for Music Information Retrieval10.5334/tismir.1024:1(236)Online publication date: 3-Dec-2021
https://doi.org/10.5334/tismir.102
Martin VRouas JMicoulaud-Franchi JPhilip PKrajewski J(2021)How to Design a Relevant Corpus for Sleepiness Detection Through Voice?Frontiers in Digital Health10.3389/fdgth.2021.6860683Online publication date: 22-Sep-2021
https://doi.org/10.3389/fdgth.2021.686068
Show More Cited By

Index Terms

The “Horse” Inside: Seeking Causes Behind the Behaviors of Music Content Analysis Systems

Recommendations

Inside Beethoven!: a musical installation for a new listening perspective
AM '20: Proceedings of the 15th International Audio Mostly Conference

On the occasion of the 250th anniversary celebrations of Ludwig van Beethoven in 2020, we developed the musical room installation "Inside Beethoven! The Audience Goes on Stage". It is a stage with virtual musicians. Visitors are invited to enter, take ...
MuseFlow: music accompaniment generation based on flow
Abstract
Arranging and orchestration are critical aspects of music composition and production. Traditional accompaniment arranging is time-consuming and requires expertise in music theory. In this work, we utilize a deep learning model, the flow model, to ...
Calliope: A Co-creative Interface for Multi-Track Music Generation
C&C '22: Proceedings of the 14th Conference on Creativity and Cognition

Calliope is a web application for co-creative multi-track music composition (MMM) in the symbolic domain. It is built to facilitate the use of multi-track music machine (MMM). The user can upload Musical Instrument Digital Interface (MIDI) files, ...

Comments

Information & Contributors

Information

Published In

cover image Computers in Entertainment

Computers in Entertainment Volume 14, Issue 2

Special Issue on Musical Metacreation, Part I

Summer 2016

135 pages

EISSN:1544-3574

DOI:10.1145/3023311

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2017

Accepted: 01 June 2016

Revised: 01 November 2015

Received: 01 May 2015

Published in CIE Volume 14, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
189
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chettri B(2023)The Clever Hans Effect in Voice Spoofing Detection2022 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT54892.2023.10022624(577-584)Online publication date: 9-Jan-2023
https://doi.org/10.1109/SLT54892.2023.10022624
Burgoyne JKoops H(2021)We are Not Groupies⋯ We are Band Aids’: Assessment Reliability in the AI Song ContestTransactions of the International Society for Music Information Retrieval10.5334/tismir.1024:1(236)Online publication date: 3-Dec-2021
https://doi.org/10.5334/tismir.102
Martin VRouas JMicoulaud-Franchi JPhilip PKrajewski J(2021)How to Design a Relevant Corpus for Sleepiness Detection Through Voice?Frontiers in Digital Health10.3389/fdgth.2021.6860683Online publication date: 22-Sep-2021
https://doi.org/10.3389/fdgth.2021.686068
Bown O(2021)Sociocultural and Design Perspectives on AI-Based Music Production: Why Do We Make Music and What Changes if AI Makes It for Us?Handbook of Artificial Intelligence for Music10.1007/978-3-030-72116-9_1(1-20)Online publication date: 3-Jul-2021
https://doi.org/10.1007/978-3-030-72116-9_1
Chettri BBenetos ESturm B(2020)Dataset Artefacts in Anti-Spoofing Systems: A Case Study on the ASVspoof 2017 BenchmarkIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2020.303677728(3018-3028)Online publication date: 2020
https://doi.org/10.1109/TASLP.2020.3036777
Kim JUrbano JLiem CHanjalic A(2020)One deep music representation to rule them all? A comparative analysis of different representation learning strategiesNeural Computing and Applications10.1007/s00521-019-04076-132:4(1067-1093)Online publication date: 1-Feb-2020
https://dl.acm.org/doi/10.1007/s00521-019-04076-1
Kim JUrbano JLiem CHanjalic A(2019)Are Nearby Neighbors Relatives? Testing Deep Music EmbeddingsFrontiers in Applied Mathematics and Statistics10.3389/fams.2019.000535Online publication date: 8-Nov-2019
https://doi.org/10.3389/fams.2019.00053

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents