Classification-based melody transcription

Ellis, Daniel P. W.; Poliner, Graham E.

doi:10.1007/s10994-006-8373-9

Classification-based melody transcription

Published: 08 May 2006

Volume 65, pages 439–456, (2006)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Classification-based melody transcription

Download PDF

Daniel P. W. Ellis¹ &
Graham E. Poliner¹

790 Accesses
22 Citations
Explore all metrics

Abstract

The melody of a musical piece—informally, the part you would hum along with—is a useful and compact summary of a full audio recording. The extraction of melodic content has practical applications ranging from content-based audio retrieval to the analysis of musical structure. Whereas previous systems generate transcriptions based on a model of the harmonic (or periodic) structure of musical pitches, we present a classification-based system for performing automatic melody transcription that makes no assumptions beyond what is learned from its training data. We evaluate the success of our algorithm by predicting the melody of the ADC 2004 Melody Competition evaluation set, and we show that a simple frame-level note classifier, temporally smoothed by post processing with a hidden Markov model, produces results comparable to state of the art model-based transcription systems.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Birmingham, W., Dannenberg, R., Wakefield, G., Bartsch, M., Bykowski, D., Mazzoni, D., Meek, C., Mellody, M., & Rand, W. (2001). MUSART: Music retrieval via aural queries. In Proc. 2nd Annual International Symposium on Music Information Retrieval ISMIR-01 (pp. 73–82). Bloomington, IN.
Chawla, N., Japkowicz, N., & Kolcz, A. (2004). Editorial: Special issue on learning from imbalanced data sets. SIGKDD Explorations, 6(1), 1–6. URL http://portal.acm.org/toc.cfm?id=1007730&type=issue.
Google Scholar
de Cheveigne, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. Journal Acoustic Society of America, 111(4), 1917–1930.
Article Google Scholar
Downie, J., West K., Ehmann, A., & Vincent, E. (2005). The 2005 music information retrieval evaluation exchange (MIREX 2005): Preliminary overview. In Proc. 6th International Symposium on Music Information Retrieval ISMIR (pp. 320–323). London.
Eggink, J., & Brown, G. J. (2004). Extracting melody lines from complex audio. In International Symposium on Music Information Retrieval (pp. 84–91).
Gomez, E., Ong, B., & Streich, S. (2004). Ismir 2004 melody extraction competition contest definition page, http://ismir2004.ismir.net/melody_contest/results.html.
Goto, M. (2004). A predominant-f0 estimation method for polyphonic musical audio signals. In 18th International Congress on Acoustics (pp. 1085–1088).
Goto, M., & Hayamizu, S. (1999). A real-time music scene description system: Detecting melody and bass lines in audio signals. In Working Notes of the IJCAI-99 Workshop on Computational Auditory Scene Analysis (pp. 31–40). Stockholm.
Lamel, L., Gauvain, J.-L., & Adda, G. (2002). Lightly supervised and unsupervised acoustic model training. Computer, Speech & Language, 16(1), 115–129. URL citeseer.ist.psu.edu/lamel02lightly.html.
Google Scholar
Li, Y., & Wang, D. L. (2005). Detecting pitch of singing voice in polyphonic audio. In IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. III.17–21).
Marolt, M. (2004). On finding melodic lines in audio recordings. In Proc. 7th International Conference on Digital Audio Effects DAFx’04, Naples, Italy. URL http://lgm.fri.uni-lj.si/~matic/clanki/dafx04_marolt.pdf
Paiva, R. P., Mendes, T., & Cardoso, A. (2005) On the detection of melody notes in polyphonic audio. In Proc. 6th International Symposium on Music Information Retrieval ISMIR (pp. 175–182). London.
Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization. In B. Scholkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in kernel methods—support vector learning (pp. 185–208). Cambridge, MA, MIT Press.
Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classication. Journal of Machine Learning Research, 5, 101–141, URL http://jmlr.csail.mit.edu/papers/volume5/rifkin04a/rifkin04a.pdf.
Sjölander, K., & Beskow, J. (2000). WaveSurfer—an open source speech tool. In Proc. Int. Conf. on Spoken Language Processing.
Talkin, D. (1995). A robust algorithm for pitch tracking (RAPT). In W. B. Kleijn & K. K. Paliwal, (Eds.), Speech coding and synthesis, chapter 14 (pp. 495–518). Elsevier, Amsterdam.
Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin markov networks. In Proc. Neural Information Processing Systems NIPS, Vancouver, URL http://www.cs.berkeley.edu/ taskar/pubs/mmmn.ps.
Turetsky, R. J., & Ellis, D. P. W. (2003). Ground-truth transcriptions of real music from force-aligned midi syntheses. In Proc. Int. Conf. on Music Info. Retrieval ISMIR-03.
Witten, I. H., & Frank, E. (2000). Data Mining: Practical machine learning tools with Java implementations. San Francisco, CA, USA, Morgan Kaufmann, ISBN 1-55860-552-5.

Download references

Author information

Authors and Affiliations

LabROSA, Department of Electrical Engineering, Columbia University, New York, NY, 10027, USA
Daniel P. W. Ellis & Graham E. Poliner

Authors

Daniel P. W. Ellis
View author publications
You can also search for this author in PubMed Google Scholar
Graham E. Poliner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel P. W. Ellis.

Additional information

Editors: Gerhard Widmer

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ellis, D.P.W., Poliner, G.E. Classification-based melody transcription. Mach Learn 65, 439–456 (2006). https://doi.org/10.1007/s10994-006-8373-9

Download citation

Received: 24 May 2005
Revised: 16 February 2006
Accepted: 20 March 2006
Published: 08 May 2006
Issue Date: December 2006
DOI: https://doi.org/10.1007/s10994-006-8373-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Classification-based melody transcription

Abstract

Article PDF

Similar content being viewed by others

Audio Classification for Melody Transcription in the Context of Indian Art Music

Interactive Melodic Analysis

Symbolic Segmentation: A Corpus-Based Analysis of Melodic Phrases

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classification-based melody transcription

Abstract

Article PDF

Similar content being viewed by others

Audio Classification for Melody Transcription in the Context of Indian Art Music

Interactive Melodic Analysis

Symbolic Segmentation: A Corpus-Based Analysis of Melodic Phrases

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation