Abstract
In this paper, a scheme to synthesize and convert singing voice into tuba sound is presented. First, our method estimates the fundamental frequency (F 0) and the aperiodicity of a monophonic audio signal, in order to obtain the pitch and volume variations of human voice. Then, the parameters extracted are used to generate a musical excerpt emulating a certain musical instrument (tuba) in such a way that the melody resembles the original sung song. To this end, two different generation approaches are devised. One of them is based on additive signal synthesis from harmonic amplitudes. The other one converts the F 0 curve into a MIDI stream, in order to allow the play back with a virtual tuba.
Similar content being viewed by others
References
Bonada J, Serra X, Amatriain X, Loscos A (2011) Spectral processing. DAFX: digital audio effects, 2nd edn, pp 393–445
de Cheveigné A, Kawahara H (2002) YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Am 111(4):1917–1930
Dittmar C, Großmann H, Cano E, Grollmisch S, Lukashevich HM, Abeßer J (2010) Songs2see and globalmusic2one: two applied research projects in music information retrieval at fraunhofer idmt. In: Ystad S, Aramaki M, Kronland-Martinet R, Jensen K (eds) CMMR, Lecture notes in computer science, vol 6684. Springer, Berlin, pp 259–272
Downie JS (2013) Mirex contest website. http://www.music-ir.org/mirex/
Haus G, Pollastri E (2011) An audio front end for query-by-humming systems. In: 2nd annual International Society for Music Information Retrieval conference (ISMIR2001), pp 65–72
Horner A (2002) Cooking with Csound. Part 1, Woodwind and brass recipes. A-R Editions, Middleton
Horner A, Ayers L (1998) Audio in the new millennium. J Audio Eng Soc 46 (10):868–879
Howard DM, Welch G, Brereton J, Himonides E, Decosta M, Williams J, Howard A (2004) WinSingad: a real-time display for the singing studio. Logoped Phoniatr Vocol 29(3):135–144. doi:10.1080/14015430410000728
Krige W, Herbst T, Niesler T (2008) Explicit transition modelling for automatic singing transcription. J New Music Res 37(4):311–324
Lesaffre M, Leman M, De Baets B, Martens J (2004) Methodological considerations concerning manual annotation of musical audio in function of algorithm development. In: Proceedings of the International Society for Music Information Retrieval conference (ISMIR04), pp 64–71
Mayor O, Bonada J, Janer J (2009) Kaleivoicecope: voice transformation from interactive installations to video games. In: Proceedings of 35st AES conference: audio for games, pp 1–8
Mayor O, Bonada J, Janer J (2011) Audio transformation technologies applied to video games. In: Proceedings of 41st AES conference: audio for games, pp 1–8
Molina E, Barbancho I, Barbancho AM, Tardón LJ (2014) Evaluation framework for automatic singing transcription. In: 15th International Society for Music Information Retrieval conference (ISMIR14), pp 567–572
Molina E, Tardón LJ, Barbancho I, Barbancho AM (2014) The importance of f0 tracking in query-by-singing-humming. In: 15th International Society for Music Information Retrieval conference (ISMIR14), pp 277–282
Molina E, Tardón L, Barbancho A, Barbancho I (2015) Sipth: singing transcription based on hysteresis defined on the pitch-time curve. IEEE/ACM Trans Audio Speech Lang Process 23(2):252–263
Moorer J (1977) Signal processing aspects of computer music: a survey. Proc IEEE 65(8):1108–1137
Poliner GE, Ellis D, Ehmann A, Gomez E, Streich S, Beesuan O (2007) Melody transcription from music audio: approaches and evaluation. IEEE Trans Audio Speech Lang Process 15(4):1247–1256
Risset JC, Wessel D (1999) Exploration of timbre by analysis and synthesis. In: Deutsch D (ed) The psychology of music. Academic, New York, pp 113–169
Ryynänen M (2006) Singing transcription. In: Klapuri A, Davy M (eds) Signal processing methods for music transcription. Springer Science + Business Media LLC, Berlin, pp 361–390
Salamon J, Serrà J, Gómez E (2013) Tonal representations for music retrieval: from version identification to query-by-humming. Int J Multimed Inf Retr, special issue on Hybrid Music Information Retrieval 2:45–58
Schafer RW, Rabiner LRs (1990) Digital representations of speech signals. Kaufmann, San Mateo, pp 49–64
Schutte K (2012) Midi toolkit for matlab. http://www.kenschutte.com/midi/
Serra X (1997) Musical sound modeling with sinusoids plus noise. Musical signal processing, pp 1–25
Stylianou Y (2009) Voice transformation: a survey. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2009), pp 3585–3588
University of Iowa: Musical Instrument Samples (MIS): Sonatina Symphonic Orchestra (2014). http://sso.mattiaswestlund.net/
Viitaniemi T, Klapuri A, Eronen A (2003) A probabilistic model for the transcription of single-voice melodies. In: Proceedings of Finnish Signal Processing Symposium, pp 5963–5957
Acknowledgments
This work has been funded by the Ministerio de Economía y Competitividad of the Spanish Government under Project No. TIN2013-47276-C6-2-R. This work has been done at Universidad de Málaga, Campus de Excelencia Internacional (CEI) Andalucía TECH.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Santacruz, J.L., Tardón, L.J., Barbancho, I. et al. VOICE2TUBA: transforming singing voice into a musical instrument. Multimed Tools Appl 76, 9855–9875 (2017). https://doi.org/10.1007/s11042-016-3582-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3582-0