Abstract
In this paper, a novel pitch detection algorithm (PDA) is proposed. Actually, pitch detection is a classical problem that has been investigated since the very beginning of speech processing. However, the novelty of the proposed method consists in establishing an empirical relationship between fundamental frequency (\(f_{0}\)) and instantaneous frequency (\(f_{i}\)), which serves as a basis to develop the proposed PDA. Even though \(f_{0}\) and \(f_{i}\) are defined as attributes of two different transforms, i.e., the Fourier transform and the Hilbert transform, respectively, the relationship proposed in this paper shows some interaction between both of them, at least empirically. The first step of this work consists in validating the proposed relationship on a large set of speech signals. Then, it is leveraged to develop an algorithm capable to (a) detect voiced/unvoiced parts of speech and (b) extract \(f_{0}\) contour from \(f_{i}\) values in the voiced parts. For evaluation purposes, the yielding \(f_{0}\) contour is compared to some well-rated state-of-the-art PDA’s. The main findings show that the quality of pitch detection obtained by the proposed technique is as satisfactory as some of top PDA’s, either in clean or in simulated noisy speech. In addition, one of the main advantages consists in bypassing the traditional short-time analysis required to assume local stationarity in speech signal.
Similar content being viewed by others
Data Availability
The datasets analyzed during the current study are available in the PTDB-TUG repository [29]. These datasets were derived from the following public domain resources: https://www2.spsc.tugraz.at/databases/PTDB-TUG/
Notes
MATLAB code is available at [25].
References
T. Abe, T. Kobayashi, S. Imai, Harmonics tracking and pitch extraction based on instantaneous frequency, in 1995 International Conference on Acoustics, Speech, and Signal Processing, IEEE, vol. 1, pp. 756–759 (1995)
T. Abe, T. Kobayashi, S . Imai, Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous frequency, in Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96. IEEE, vol. 2, pp. 1277–1280 (1996)
Y. Agiomyrgiannaki, Yang: Yet-another-generalized vocoder. https://github.com/google/yang_vocoder/, last accessed: 31-05-2022 (2017)
E. Azarov, M. Vashkevich, A. Petrovsky, Instantaneous pitch estimation based on rapt framework, in 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO). IEEE, pp. 2787–2791 (2012)
H. Ba, N. Yang, I. Demirkol, W. Heinzelman, Bana: a hybrid approach for noise resilient pitch detection, in 2012 IEEE Statistical Signal Processing Workshop (SSP). IEEE, pp. 369–372 (2012)
B. Boashash, Estimating and interpreting the instantaneous frequency of a signal. II. Algorithms and applications. Proc. IEEE 80(4), 540–568 (1992)
P. Boersma, D. Weenink, Praat: doing phonetics by computer. https://www.fon.hum.uva.nl/praat/, last accessed: 31-05-2022 (2006)
A. Camacho, J.G. Harris, A sawtooth waveform inspired pitch estimator for speech and music. J. Acoust. Soc. Am. 124(3), 1638–1652 (2008)
W. Chu, A. Alwan, Reducing f0 frame error of f0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend, in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, pp. 3969–3972 (2009)
A. De Cheveigné, H. Kawahara, Yin, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4), 1917–1930 (2002)
A. De Cheveigné, H. Kawahara, Yin algorithm. https://labrosa.ee.columbia.edu/doc/yin.html, last accessed: 31-05-2022 (2002)
T. Drugman, A . Alwan, Joint robust voicing detection and pitch estimation based on residual harmonics, in Proceedings of the Interspeech 2011, Florence, Italy. IEEE, pp. 1973–1976 (2011)
D. Gabor, Theory of communication. Part 1. The analysis of information. J. Inst. Electr. Eng. Part III Radio Commun. Eng. 93(26), 429–441 (1946)
S. Gonzalez, M. Brookes, Pefac: a pitch estimation algorithm robust to high levels of noise. IEEE/ACM Trans. Audio Speech Lang. Process. 22(2), 518–530 (2014)
S.W. Group et al., Speech signal processing toolkit (sptk) version 3.3, https://sourceforge.net/projects/sp-tk//, last accessed: 31-05-2022 (2009)
D.J. Hermes, Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. 83(1), 257–264 (1988)
W. Hess, Manual and instrumental pitch determination, voicing determination, in Pitch Determination of Speech Signals. Springer, pp. 92–151 (1983)
H. Huang, J. Pan, Speech pitch determination based on Hilbert–Huang transform. Signal Process. 86(4), 792–803 (2006)
N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 454(1971), 903–995 (1998)
D. Jouvet, Y. Laprie, Performance analysis of several pitch detection algorithms on simulated and real noisy speech data, in 2017 25th European Signal Processing Conference (EUSIPCO). IEEE, pp. 1614–1618 (2017)
S. Kadambe, G.F. Boudreaux-Bartels, Application of the wavelet transform for pitch detection of speech signals. IEEE Trans. Inf. Theory 38(2), 917–924 (1992)
H. Kawahara, Y. Agiomyrgiannakis, H. Zen, Using instantaneous frequency and aperiodicity detection to estimate f0 for high-quality speech synthesis, in 9th ISCA Speech Synthesis Workshop (SSW9), ISCA, pp. 221–228 (2016)
A. Kissling, R. Kompe, N. Niemann, A. Batliner, Dp-based determination of f0 contours from speech signals, in Acoustics, Speech, and Signal Processing, 1992. Proceedings. (ICASSP’92), IEEE, vol. 1, pp. 1–4 (1992)
E. Liflyand, Interaction between the Fourier transform and the Hilbert transform. Acta et Commentationes Universitatis Tartuensis de Mathematica 18(1), 19–32 (2014)
Z. Mnasri, Proposed algorithm, https://github.com/zied-mnasri/f0_IF_model, last accessed: 31-05-2022 (2021)
Z. Mnasri, H. Amiri, On the relationship between instantaneous frequency and pitch in speech signals, in Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2018. ed. by A. Berton, U. Haiber, W. Minker (TUDpress, Dresden, 2018), pp. 23–29
Z. Mnasri, S. Rovetta, F. Masulli, A novel pitch detection algorithm based on instantaneous frequency, in 2021 29th European Signal Processing Conference (EUSIPCO), IEEE, pp. 16–20 (2021). http://doi.org/10.23919/EUSIPCO54536.2021.9616047
A.M. Noll, Cepstrum pitch determination. J. Acoust. Soc. Am. 41(2), 293–309 (1967)
G. Pirker, M. Wohlmayr, S. Petrik, F. Pernkopf, A pitch tracking corpus with evaluation on multipitch tracking scenario, in Twelfth Annual Conference of the International Speech Communication Association. http://doi.org/10.21437/Interspeech.2011 (2011)
B. Van der Pol, The fundamental principles of frequency modulation. J. Inst. Electr. Eng. Part III Radio Commun. Eng. 93(23), 153–158 (1946)
L. Qiu, H. Yang, S.N. Koh, Fundamental frequency determination based on instantaneous frequency estimation. Signal Process. 44(2), 233–241 (1995)
L. Rabiner, On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoust. Speech Signal Process. 25(1), 24–33 (1977)
P. Rengaswamy, K.S. Rao, P. Dasgupta, Songf0: a spectrum-based fundamental frequency estimation for monophonic songs. Circuits Syst. Signal Process. 40(2), 772–797 (2021)
M. Ross, H. Shaffer, A. Cohen, R. Freudberg, H. Manley, Average magnitude difference function pitch extractor. IEEE Trans. Acoust. Speech Signal Process. 22(5), 353–362 (1974)
S. Shimauchi, S. Kudo, Y. Koizumi, K. Furuya, On relationships between amplitude and phase of short-time Fourier transform, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 676–680 (2017)
Y. Stylianou, Modeling speech based on harmonic plus noise models, in International School on Neural Networks, Initiated by IIASS and EMFCSC. (Springer, 2004), pp. 244–260
L. Sukhostat, Y. Imamverdiyev, A comparative analysis of pitch detection methods under the influence of different noise conditions. J. Voice 29(4), 410–417 (2015)
X. Sun, A pitch determination algorithm based on subharmonic-to-harmonic ratio, in Sixth International Conference on Spoken Language Processing (2000)
X. Sun, Pitch determination algorithm. https://www.mathworks.com/matlabcentral/fileexchange/1230-pitch-determination-algorithm, last accessed: 31-05-2022 (2002)
J. Tabrikian, S. Dubnov, Y. Dickalov, Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model. IEEE Trans. Speech Audio Process. 12(1), 76–87 (2004)
D. Talkin, W.B. Kleijn, A robust algorithm for pitch tracking (rapt). Speech Coding Synth. 495, 518 (1995)
L.N. Tan, A. Alwan, Multi-band summary correlogram-based pitch detection for noisy speech. Speech Commun. 55(7–8), 841–856 (2013)
J. Ville, Theorie et application de la notion de signal analytique. Câbles et transmissions 2(1), 61–74 (1948)
K. Wu, D. Zhang, G. Lu, Ipeeh: improving pitch estimation by enhancing harmonics. Expert Syst. Appl. 64, 317–329 (2016)
S.A. Zahorian, H. Hu, A spectral/temporal method for robust fundamental frequency tracking. J. Acoust. Soc. Am. 123(6), 4559–4571 (2008)
Acknowledgements
This work was supported by the University of Tunis El Manar, Tunisia, and by the University of Genoa, Italy.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mnasri, Z., Rovetta, S. & Masulli, F. A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech. Circuits Syst Signal Process 41, 6266–6294 (2022). https://doi.org/10.1007/s00034-022-02082-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-022-02082-8