Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The “Horse” Inside: Seeking Causes Behind the Behaviors of Music Content Analysis Systems

Published: 10 January 2017 Publication History

Abstract

Building systems that possess the sensitivity and intelligence to identify and describe high-level attributes in music audio signals continues to be an elusive goal but one that surely has broad and deep implications for a wide variety of applications. Hundreds of articles have so far been published toward this goal, and great progress appears to have been made. Some systems produce remarkable accuracies at recognizing high-level semantic concepts, such as music style, genre, and mood. However, it might be that these numbers do not mean what they seem. In this article, we take a state-of-the-art music content analysis system and investigate what causes it to achieve exceptionally high performance in a benchmark music audio dataset. We dissect the system to understand its operation, determine its sensitivities and limitations, and predict the kinds of knowledge it could and could not possess about music. We perform a series of experiments to illuminate what the system has actually learned to do and to what extent it is performing the intended music listening task. Our results demonstrate how the initial manifestation of music intelligence in this state of the art can be deceptive. Our work provides constructive directions toward developing music content analysis systems that can address the music information and creation needs of real-world users.

References

[1]
S. Argamon, K. Burns, and S. Dubnov (Eds.). 2010. The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning. Springer.
[2]
J.-.J. Aucouturier. 2009. Sounds like teen spirit: Computational insights into the grounding of everyday musical terms. In Language, Evolution and the Brain: Frontiers in Linguistic Series, J. Minett and W. Wang (Eds.). Academia Sinica Press.
[3]
J. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. Kégl. 2006. Aggregate features and adaboost for music classification. Mach. Learn. 65, 2--3 (Jun. 2006), 473--484.
[4]
M. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney. 2008. Content-based music information retrieval: Current directions and future challenges. Proc. IEEE 96, 4 (Apr. 2008), 668--696.
[5]
Nick Collins. 2010. Introduction to Computer Music. Wiley.
[6]
D. Cope. 1991. Computers and Musical Style. Oxford University Press.
[7]
P. P. Cruz and E. Vidal-Ruiz. 2003. Modeling musical style using grammatical inference techniques: A tool for classifying and generating melodies. In Proc. Wedelmusic. 77--84.
[8]
Pedro P. Cruz and Enrique Vidal. 2008. Two grammatical inference applications in music processing. Appl. Artif. Intell. 22, 1/2 (2008), 53--76.
[9]
S. J. Cunningham, D. Bainbridge, and J. S. Downie. 2012. The impact of MIREX on scholarly research. In Proc. ISMIR. 259--264.
[10]
R. B. Dannenberg, B. Thom, and D. Watson. 1997. A machine learning approach to musical style recognition. In Proc. ICMC. 344--347.
[11]
L. Deng and D. Yu. 2014. Deep Learning: Methods and Applications. Now Publishers.
[12]
S. Dixon, F. Gouyon, and G. Widmer. 2004. Towards characterisation of music via rhythmic patterns. In Proc. ISMIR. 509--517.
[13]
S. Dixon, E. Pampalk, and G. Widmer. 2003. Classification of dance music by periodicity patterns. In Proc. ISMIR.
[14]
Alexey Dosovitskiy, Jost Tobias Springenberg, and Thomas Brox. 2014. Learning to generate chairs with convolutional neural networks. CoRR abs/1411.5928 (2014).
[15]
J. Downie, Andreas Ehmann, Mert Bay, and M. Jones. 2010. The music information retrieval evaluation exchange: Some observations and insights. In Advances in Music Information Retrieval, Zbigniew Ras and Alicja Wieczorkowska (Eds.). Springer, Berlin, 93--115.
[16]
J. S. Downie. 2004. The scientific evaluation of music information retrieval systems: Foundations and future. Comput. Music J. 28, 2 (2004), 12--23.
[17]
J. S. Downie. 2008. The music information retrieval evaluation exchange (2005--2007): A window into music information retrieval research. Acoust. Sci. Tech. 29, 4 (2008), 247--255.
[18]
S. Dubnov, G. Assayag, O. Lartillot, and G. Bejerano. 2003. Using machine-learning methods for musical style modeling. Computer 36, 10 (2003), 73--80.
[19]
S. Dubnov and G. Surges. 2014. Digital Da Vinci. Springer, Chapter Delegating Creativity: Use of Musical Algorithms in Machine Listening and Composition, 127--158.
[20]
A. Eigenfeldt. 2012. Embracing the bias of the machine: Exploring non-human fitness functions. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.
[21]
A. Eigenfeldt. 2013. The human fingerprint in machine generated music. In Proc. xCoAx.
[22]
A. Eigenfeldt and P. Pasquier. 2013a. Considering vertical and horizontal context in corpus-based generative electronic dance music. In Proc. Int. Conf. Computational Creativity.
[23]
A. Eigenfeldt and P. Pasquier. 2013b. Evolving structures for electronic dance music. In Proc. Conf. Genetic and Evolutionary Computation. 319--326.
[24]
A. Eigenfeldt, M. Thorogood, J. Bizzocchi, P. Pasquier, and T. Calvert. 2014. Video, music and sound metacreation. In Proc. xCoAx. Porto, Portugal.
[25]
T. M. Esparza, J. P. Bello, and E. J. Humphrey. 2014. From genre classification to rhythm similarity: Computational and musicological insights. J. New Music Res. (2014).
[26]
F. Fabbri. 1999. Browsing musical spaces: Categories and the musical mind. In Proc. Intl. Association for the Study of Popular Music.
[27]
A. Flexer, F. Gouyon, S. Dixon, and G. Widmer. 2006. Probabilistic combination of features for music classification. In Proc. ISMIR. Victoria, BC, Canda, 111--114.
[28]
Z. Fu, G. Lu, K. M. Ting, and D. Zhang. 2011. A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13, 2 (Apr. 2011), 303--319.
[29]
F. Ghedini, F. Pachet, and P. Roy. 2015. Multidisciplinary Contributions to the Science of Creative Thinking. Springer, Chapter Creating Music and Texts with Flow Machines.
[30]
L. Goehr. 1994. The Imaginary Museum of Musical Works: An Essay in the Philosophy of Music. Oxford University Press.
[31]
F. Gouyon. 2005. A Computational Approach to Rhythm Description—Audio Features for the Computation of Rhythm Periodicity Functions and Their use in Tempo Induction and Music Content Processing. Ph.D. Dissertation. Universitat Pompeu Fabra.
[32]
F. Gouyon and S. Dixon. 2004. Dance music classification: A tempo-based approach. In Proc. ISMIR. 501--504.
[33]
F. Gouyon, S. Dixon, E. Pampalk, and G. Widmer. 2004. Evaluating rhythmic descriptors for musical genre classification. In Proc. Int. Audio Eng. Soc. Conf. 196--204.
[34]
F. Gouyon, B. L. Sturm, J. L. Oliveira, N. Hespanhol, and T. Langlois. 2014. On Evaluation Validity in Music Autotagging. eprint arXiv:1410.0001. https://arxiv.org/abs/1410.0001.
[35]
P. Hamel and D. Eck. 2010. Learning features from music audio with deep belief networks. In Proc. ISMIR. 339--344.
[36]
T. Hastie, R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2 ed.). Springer-Verlag.
[37]
L. Hiller and L. Isaacson. 1959. Experimental Music: Composition with an Electronic Computer. Greenwood Press.
[38]
A. Holzapfel and Y. Stylianou. 2008. Rhythmic similarity of music based on dynamic periodicity warping. In Proc. ICASSP. 2217--2220.
[39]
A. Holzapfel and Y. Stylianou. 2009. A scale based method for rhythmic similarity of music. In Proc. ICASSP. Taipei, Taiwan, 317--320.
[40]
E. J. Humphrey, J. P. Bello, and Y. LeCun. 2013. Feature learning and deep architectures: New directions for music informatics. J. Intell. Info. Syst. 41, 3 (2013), 461--481.
[41]
F. Krebs, S. Böch, and G. Widmer. 2013. Rhythmic pattern modeling for beat and downbeat tracking in musical audio. In Proc. ISMIR.
[42]
T. Lidy. 2006. Evaluation of New Audio Features and Their Utilization in Novel Music Retrieval Applications. Master's thesis. Vienna University of Tech., Vienna, Austria.
[43]
T. Lidy, R. Mayer, A. Rauber, P. P. de Leon, A. Pertusa, and J. Quereda. 2010. A cartesian ensemble of feature subspace classifiers for music categorization. In Proc. ISMIR. 279--284.
[44]
T. Lidy and A. Rauber. 2005. Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In Proc. ISMIR. 34--41.
[45]
Thomas Lidy and Andreas Rauber. 2008. Classification and clustering of music for novel music access applications. In Machine Learning Techniques for Multimedia, Matthieu Cord and Pádraig Cunningham (Eds.). Springer, Berlin, 249--285.
[46]
T. Lidy, A. Rauber, A. Pertusa, and J. M. I nesta. 2007. Improving genre classification by combination of audio and symbolic descriptors using a transcription system. In Proc. ISMIR. Vienna, Austria, 61--66.
[47]
R. Mayer, A. Rauber, P. J. Ponce de León, C. Pérez-Sancho, and J. M. Iñesta. 2010. Feature selection in a cartesian ensemble of feature subspace classifiers for music categorisation. In Proc. ACM Int. Workshop Machine Learning and Music. 53--56.
[48]
A. Nguyen, J. Yosinski, and J. Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proc. CVPR. 427--436.
[49]
Francois Pachet. 2003. The continuator: Musical interaction with style. J, New Music Res, 32, 3 (2003), 333--341.
[50]
F. Pachet. 2011. Musical metadata and knowledge management. In Encyclopedia of Knowledge Management, David G. Schwartz and Dov Te'eni (Eds.). IGI Global, 1192--1199.
[51]
G. Peeters. 2005. Rhythm classification using spectral rhythm patterns. In Proc. ISMIR.
[52]
G. Peeters. 2011. Spectral and temporal periodicity representations of rhythm for the automatic classification of music audio signal. IEEE Trans. Audio, Speech Lang. Process. 19, 5 (July 2011), 1242--1252.
[53]
O. Pfungst. 1911. Clever Hans (The Horse of Mr. Von Osten): A Contribution to Experimental Animal and Human Psychology. Henry Holt, New York.
[54]
A. Pikrakis. 2013. A deep learning approach to rhythm modeling with applications. In Proc. Int. Workshop Machine Learning and Music.
[55]
T. Pohle, D. Schnitzer, M. Schedl, P. Knees, and G. Widmer. 2009. On rhythm and general music similarity. In Proc. ISMIR.
[56]
C. Roads. 1996. Computer Music Tutorial. The MIT Press.
[57]
N. Scaringella, G. Zoia, and D. Mlynek. 2006. Automatic genre classification of music content: A survey. IEEE Signal Process. Mag. 23, 2 (Mar. 2006), 133--141.
[58]
A. Schindler and A. Rauber. 2012. Capturing the temporal domain in echonest features for improved classification effectiveness. In Proc. Adaptive Multimedia Retrieval.
[59]
J. Schlüter and C. Osendorfer. 2011. Music similarity estimation with the mean-covariance restricted boltzmann machine. In Proc. ICMLA.
[60]
Dominik Schnitzer, Arthur Flexer, Markus Schedl, and Gerhard Widmer. 2011. Using mutual proximity to improve content-based audio similarity. In ISMIR. 79--84.
[61]
D. Schnitzer, A. Flexer, M. Schedl, and G. Widmer. 2012. Local and global scaling reduce hubs in space. J. Mach. Learn. Res. 13 (2012), 2871--2902.
[62]
K. Seyerlehner. 2010. Content-based Music Recommender Systems: Beyond Simple Frame-level Audio Similarity. Ph.D. Dissertation. Johannes Kepler University, Linz, Austria.
[63]
K. Seyerlehner, M. Schedl, R. Sonnleitner, D. Hauger, and B. Ionescu. 2012. From improved auto-taggers to improved music similarity measures. In Proc. Adaptive Multimedia Retrieval. Copenhagen, Denmark.
[64]
K. Seyerlehner, G. Widmer, and T. Pohle. 2010. Fusing block-level features for music similarity estimation. In Proc. DAFx. 1--8.
[65]
C. E. Shannon and W. Weaver. 1998. The Mathematical Theory of Communication. University of Illinois Press.
[66]
C. N. Silla, A. L. Koerich, and C. A. A. Kaestner. 2008. The latin music database. In Proc. ISMIR. 451--456.
[67]
M. Slaney. 1998. Auditory Toolbox. Technical Report. Interval Research Corporation.
[68]
Y. Song, S. Dixon, and M. Pearce. 2012. Evaluation of musical features for emotion classification. In Proc. ISMIR.
[69]
B. L. Sturm. 2012. Two systems for automatic music genre recognition: What are they really recognizing? In Proc. ACM MIRUM Workshop. 69--74.
[70]
B. L. Sturm. 2013. Classification accuracy is not enough: On the evaluation of music genre recognition systems. J. Intell. Info. Syst. 41, 3 (2013), 371--406.
[71]
B. L. Sturm. 2014a. A simple method to determine if a music information retrieval system is a “horse.” IEEE Trans. Multimedia 16, 6 (2014), 1636--1644.
[72]
B. L. Sturm. 2014b. The state of the art ten years after a state of the art: Future research in music information retrieval. J. New Music Res. 43, 2 (2014), 147--172.
[73]
B. L. Sturm. 2014c. A survey of evaluation in music genre recognition. In Adaptive Multimedia Retrieval: Semantics, Context, and Adaptation, A. Nürnberger, S. Stober, B. Larsen, and M. Detyniecki (Eds.), Vol. LNCS 8382. 29--66.
[74]
B. L. Sturm, C. Kereliuk, and A. Pikrakis. 2014. A closer look at deep learning neural networks with low-level spectral periodicity features. In Proc. Int. Workshop on Cognitive Info. Process. 1--6.
[75]
Li Su, C.-C. M. Yeh, Jen-Yu Liu, Ju-Chiang Wang, and Yi-Hsuan Yang. 2014. A systematic evaluation of the bag-of-frames representation for music information retrieval. IEEE Trans. Multimedia 16, 5 (Aug 2014), 1188--1200.
[76]
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. 2014. Intriguing properties of neural networks. In Proc. ICLR.
[77]
N. G. Thomas, P. Pasquier, A. Eigenfeldt, and J. B. Maxwell. 2013. A methodology for the comparuison of melodic generation models using meta-melo. In Proc. ISMIR.
[78]
E. Tsunoo, G. Tzanetakis, N. Ono, and S. Sagayama. 2009. Audio genre classification using percussive pattern clustering combined with timbral features. In Proc. ICME.
[79]
E. Tsunoo, G. Tzanetakis, N. Ono, and S. Sagayama. 2011. Beyond timbral statistics: Improving music classification using percussive patterns and bass lines. IEEE Trans. Aud. Speech Lang. Process. 19, 4 (May 2011), 1003--1014.
[80]
D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. 2008. Semantic annotation and retrieval of music and sound effects. IEEE Trans. Aud. Speech Lang. Process. 16 (2008).
[81]
G. Tzanetakis and P. Cook. 2002. Musical genre classification of audio signals. IEEE Trans. Speech Aud. Process. 10, 5 (July 2002), 293--302.
[82]
J. Urbano, M. Schedl, and X. Serra. 2013. Evaluation in music information retrieval. J. Intell. Info. Syst. 41, 3 (Dec. 2013), 345--369.
[83]
A. Wang. 2003. An industrial strength audio search algorithm. In Proc. Int. Soc. Music Info. Retrieval.
[84]
G. A. Wiggins. 2009. Semantic gap?? schemantic schmap!! methodological considerations in the scientific study of music. In Proc. IEEE Int. Symp. Mulitmedia. 477--482.
[85]
World Sport Dance Federation. 2014. World Sport Dance Federation Competition Rules. World Sport Dance Federation, Bucharest, Romania. Retrieved from https://www.worlddancesport.org/Rule/Competition/General.

Cited By

View all
  • (2023)The Clever Hans Effect in Voice Spoofing Detection2022 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT54892.2023.10022624(577-584)Online publication date: 9-Jan-2023
  • (2021)We are Not Groupies⋯ We are Band Aids’: Assessment Reliability in the AI Song ContestTransactions of the International Society for Music Information Retrieval10.5334/tismir.1024:1(236)Online publication date: 3-Dec-2021
  • (2021)How to Design a Relevant Corpus for Sleepiness Detection Through Voice?Frontiers in Digital Health10.3389/fdgth.2021.6860683Online publication date: 22-Sep-2021
  • Show More Cited By

Index Terms

  1. The “Horse” Inside: Seeking Causes Behind the Behaviors of Music Content Analysis Systems

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Computers in Entertainment
        Computers in Entertainment   Volume 14, Issue 2
        Special Issue on Musical Metacreation, Part I
        Summer 2016
        135 pages
        EISSN:1544-3574
        DOI:10.1145/3023311
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 10 January 2017
        Accepted: 01 June 2016
        Revised: 01 November 2015
        Received: 01 May 2015
        Published in CIE Volume 14, Issue 2

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Deep learning
        2. empiricism
        3. music genre and style

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)19
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 30 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)The Clever Hans Effect in Voice Spoofing Detection2022 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT54892.2023.10022624(577-584)Online publication date: 9-Jan-2023
        • (2021)We are Not Groupies⋯ We are Band Aids’: Assessment Reliability in the AI Song ContestTransactions of the International Society for Music Information Retrieval10.5334/tismir.1024:1(236)Online publication date: 3-Dec-2021
        • (2021)How to Design a Relevant Corpus for Sleepiness Detection Through Voice?Frontiers in Digital Health10.3389/fdgth.2021.6860683Online publication date: 22-Sep-2021
        • (2021)Sociocultural and Design Perspectives on AI-Based Music Production: Why Do We Make Music and What Changes if AI Makes It for Us?Handbook of Artificial Intelligence for Music10.1007/978-3-030-72116-9_1(1-20)Online publication date: 3-Jul-2021
        • (2020)Dataset Artefacts in Anti-Spoofing Systems: A Case Study on the ASVspoof 2017 BenchmarkIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2020.303677728(3018-3028)Online publication date: 2020
        • (2020)One deep music representation to rule them all? A comparative analysis of different representation learning strategiesNeural Computing and Applications10.1007/s00521-019-04076-132:4(1067-1093)Online publication date: 1-Feb-2020
        • (2019)Are Nearby Neighbors Relatives? Testing Deep Music EmbeddingsFrontiers in Applied Mathematics and Statistics10.3389/fams.2019.000535Online publication date: 8-Nov-2019

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media