Wavelet-based cepstrum calculation

Fabricio Daniel Sánchez

Wavelet-based cepstrum calculation

Journal of Computational and Applied Mathematics, 2009

F.L. Sanchez et al. / Journal of Computational and Applied Mathematics 227 (2009) 288–293 289 Fig. 1. [above]: log spectrum of a voiced speech signal; [below]: corresponding cepstrum obtained with the Fourier approach, where the period of pitch, p, appears as a spike. Table 1 The algorithm for obtaining the cepstrum, based on the Fourier approach BEGINNING STEP S1: Input speech utterance, s[n], with length L; STEP S2: s[n]← DFT(s[n]); STEP S3: For (int i = 0; i < L; i ++) s i ← log 10 k(s i )k 2 ; STEP S4: s[n]← IDFT(s[n]), where IDFT is the inverse DFT; STEP S5: For (int i = 0; i < L; i ++) s i ← (s i ) 2 ; END. 2. A brief review: The cepstrum According to [1], if E is the excitation, and H represents the vocal tract resonance, then, the speech spectrum is S = E · H. Cepstral analysis converts this product into a sum of spectra as follows: S = E · H ⇒ log(S) = log(E · H) ⇒ log(S) = log(E) + log(H). The spectrum of H varies slowly with frequency, because it consists mostly of smooth formant curves, and the spectrum of E is more irregular, since it relates to the harmonics of the excitation. Therefore, the cepstral analysis linearly separates the contributions given by E and H to form S. One particular application of cepstral analysis is in determining the period of pitch, p, which corresponds to the time interval between two consecutive glottal closures that control the flow of air from E. Another application is the determination of the formant frequencies, that are related to H. Quefrency, in seconds, is the scale used to measure the cepstrum. Fig. 1 illustrates the explanations above, and the algorithm for obtaining the cepstrum based on the DFT/FFT appears in Table 1. 3. The proposed algorithm The proposed algorithm appears in Table 2 and further explanation follows. In steps S2 and S4, the full Discrete Wavelet- Packet Transform (DWPT) tree [5] is needed. For this algorithm, the DWPT requires the natural frequency ordering (NFO), instead of the filter bank ordering (FBO). For each level of decomposition, the NFO is obtained by alternating the order in which the pair of filters, low pass (h[n]) and high pass (g[n]), is applied to decompose each sub-band. A detailed explanation of this procedure can be found in [5], pp. 107–111. Unlike for the Fourier-based cepstrum, the proposed algorithm does not require an inverse DWPT (IDWPT).

Journal of Computational and Applied Mathematics 227 (2009) 288–293 Contents lists available at ScienceDirect Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam Wavelet-based cepstrum calculation Fabrício Lopes Sanchez a , Sylvio Barbon Júnior b,∗ , Lucimar Sasso Vieira b , Rodrigo Capobianco Guido b , Everthon Silva Fonseca c , Paulo Rogério Scalassara c , Carlos Dias Maciel c , José Carlos Pereira c , Shi-Huang Chen d a Department of Bioengineering, University of São Paulo, USP, São Carlos, São Paulo, 13566-590, Brazil b Institute of Physics of São Carlos - University of São Paulo, USP, São Carlos, São Paulo, 13566-590, Brazil c Department of Electrical Engineering, School of Engineering at São Carlos - University of São Paulo, USP, São Carlos, São Paulo 13566-590, Brazil d Department of Computer Science and Information Engineering, Shu-Te University, Kaohsiung County, 824, Taiwan, ROC article info Article history: Received 4 April 2006 Received in revised form 30 September 2007 Keywords: Wavelet transform Cepstrum Pitch period Speech analysis a b s t r a c t In this paper we present a new wavelet-based algorithm for low-cost computation of the cepstrum. It can be used for real time precise pitch determination in automatic speech and speaker recognition systems. Many wavelet families are examined to determine the one that works best. The results confirm the efficacy and accuracy of the proposed technique for pitch extraction. © 2008 Elsevier B.V. All rights reserved. 1. Introduction Cepstral analysis is fairly well discussed in [1,2], and many other references. With it, a voiced speech segment can be separated into two main parts: the resonances of the vocal tract, and the harmonic peaks which come from the source pulmonary excitation ([1], pp. 30; [2], pp. 79). Cepstral analysis has been intensively used for speech recognition, speaker recognition, and speaker verification and identification [1,3,4]. The cepstrum, which is the signal derived from the cepstral analysis of a speech utterance, is traditionally obtained by means of the Discrete Fourier Transform (DFT); see Section 2. This approach gives precise results but with O(L2 ), i.e., the order of computational complexity is quadratic in relation to the signal length (L). If the Fast Fourier Transform (FFT) is used, the complexity reduces to O(L · log(L)). The proposed algorithm has a lower order of complexity, since its basic principle is the convolution of the input signal with a wavelet filter of small support. Furthermore, the accuracy that we obtained with the proposed algorithm is comparable to the one obtained using the DFT or FFT. For many real time applications, the proposed approach is a valuable alternative to the existing algorithms based on the DFT or FFT for pitch extraction. This paper is organized as follows. Section 2 presents a brief review on the cepstrum and its calculation based on the DFT/FFT. The proposed algorithm is described in Section 3. A study on wavelets and their important characteristics for precise cepstrum calculation appears in Section 4. Section 5 describes the tests and results, and, lastly, Section 6 presents the conclusions. ∗ Corresponding address: Universidade de Sao Paulo, IFSC, Postal 369 - CEP, 13560-970 Sao Carlos, Sao Paulo, Brazil. E-mail address: sbarbonjr@yahoo.com.br (S. Barbon Júnior). 0377-0427/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.cam.2008.03.016 F.L. Sanchez et al. / Journal of Computational and Applied Mathematics 227 (2009) 288–293 289 Fig. 1. [above]: log spectrum of a voiced speech signal; [below]: corresponding cepstrum obtained with the Fourier approach, where the period of pitch, p, appears as a spike. Table 1 The algorithm for obtaining the cepstrum, based on the Fourier approach BEGINNING STEP S1: STEP S2: STEP S3: STEP S4: STEP S5: Input speech utterance, s[n], with length L; s[n] ← DFT(s[n]); For (int i = 0; i < L; i + +) si ← log10 k(si )k2 ; s[n] ← IDFT(s[n]), where IDFT is the inverse DFT; For (int i = 0; i < L; i + +) si ← (si )2 ; END. 2. A brief review: The cepstrum According to [1], if E is the excitation, and H represents the vocal tract resonance, then, the speech spectrum is S = E · H. Cepstral analysis converts this product into a sum of spectra as follows: S = E · H ⇒ log(S) = log(E · H) ⇒ log(S) = log(E) + log(H). The spectrum of H varies slowly with frequency, because it consists mostly of smooth formant curves, and the spectrum of E is more irregular, since it relates to the harmonics of the excitation. Therefore, the cepstral analysis linearly separates the contributions given by E and H to form S. One particular application of cepstral analysis is in determining the period of pitch, p, which corresponds to the time interval between two consecutive glottal closures that control the flow of air from E. Another application is the determination of the formant frequencies, that are related to H. Quefrency, in seconds, is the scale used to measure the cepstrum. Fig. 1 illustrates the explanations above, and the algorithm for obtaining the cepstrum based on the DFT/FFT appears in Table 1. 3. The proposed algorithm The proposed algorithm appears in Table 2 and further explanation follows. In steps S2 and S4, the full Discrete WaveletPacket Transform (DWPT) tree [5] is needed. For this algorithm, the DWPT requires the natural frequency ordering (NFO), instead of the filter bank ordering (FBO). For each level of decomposition, the NFO is obtained by alternating the order in which the pair of filters, low pass (h[n]) and high pass (g[n]), is applied to decompose each sub-band. A detailed explanation of this procedure can be found in [5], pp. 107–111. Unlike for the Fourier-based cepstrum, the proposed algorithm does not require an inverse DWPT (IDWPT). 290 F.L. Sanchez et al. / Journal of Computational and Applied Mathematics 227 (2009) 288–293 Table 2 Pseudo-code for the proposed algorithm BEGINNING STEP S1: STEP S2: Input the speech utterance, s[n], with length L, that must be a power of 2; s[n] ← DWPT (s[n], m), that is, turn s[n] into its full mth-level Discrete log(L) Wavelet-Packet Transform (DWPT), where m = log(2) ; For (int i = 0; i < L; i + +) si ← log10 (si )2 ; repeat step S2, i.e., apply the DWPT again to s[n]; For (int i = 0; i < L; i + +) si ← (si )2 ; STEP S3: STEP S4: STEP S5: END. After carrying this out, the signal s[n] will become the cepstrum of its original version. To obtain the period of pitch, p, of a speech utterance, a search can be performed within the range [a–b] to find the highest value, i.e., the spike that represents the pitch. The values of a and b that represent samples of the cepstrum can be determined as follows: sampling_rate , a = round 400.0 sampling_rate b = round , 60.0 where 60 and 400 are the limits of frequencies, in hertz, for searching for the pitch. The kth sample between a and b that contains the highest value is related to the frequency of pitch, f = 1p , according to the following: f = sampling_rate k Hz. It is interesting to note that the Fourier-based approach converts the input signal from the time domain to the frequency domain, scales it using the logarithmic function, and then returns the scaled signal to the time domain. On the other hand, the proposed approach filters the input signal, scales it, filters again, and then calculates the energy of each sub-band. Since the input signal is decomposed until the maximum level is reached, the leaves of the decomposition tree contain only one sample each and the sample with the highest value within the range [a–b] is the one related to the pitch period. The Fourier-based cepstrum of a signal is computationally complex as compared to that for the proposed method, when the signal length is long. The order of complexity of the proposed approach is N · K · O(L), N being a constant that corresponds to the wavelet filter lengths adopted, and K being the number of decomposition levels of the wavelet-packet tree. 4. The choice of the wavelet The proposed algorithm requires the analysis of s[n] twice, according to the steps S2 and S4 of the previous section. The process by which s[n] is analyzed is Mallat’s pyramidal algorithm [6]. It consists of low-pass and high-pass filtering, by discrete convolutions [7], followed by down-samplings by 2. A generalization of Mallat’s algorithm is well described in the literature, and can be found in [5,7]; therefore, this topic is not included here. During the analysis of s[n], the length of h[k] and g[k], N, is responsible for frequency selectivity, Q , and time resolution, R. As N increases, Q increases and R decreases. Therefore, a balanced value for N is needed [8]. Let us define this as being the requirement 1. Another important consideration, say requirement 2, is that (almost) linear phase filters, h[k] and g[k], are desirable to avoid distortion in the filtered signal, i.e., (almost) symmetrical or anti-symmetrical impulse responses are preferable. Lastly, we can define requirement 3, that is, a flat frequency response in the pass and in the stop bands of the filters is desirable to attenuate the Gibbs phenomenon [9]. The proposed algorithm does not require any synthesis of the speech signal, i.e., no inverse DWT (IDWT) is performed. Therefore, it is sufficient to consider only the requirements 1, 2, and 3, which are related to the filters h[n] and g[n], and P P disregard particular characteristics of the scaling and wavelet functions [7], φ(n) = k hk φ(2n−k) and ψ(n) = k gk φ(2n−k), respectively. In order to satisfy the requirement 1, we have to find N in such a way that it balances the constraints Q and R. Imposing limits on N, say c and d, (c < d), a search can be performed within this range to allow optimal results in one particular respect: R or Q . Requirement 2 can only be satisfied by using finite impulse response (FIR) wavelet filters, since infinite impulse response (IIR) ones, say Shannon, Meyer, and so on, do not exhibit linear phase [7]. Therefore, the only alternative among the wellknown FIR wavelet filters [7] is the Haar wavelet. Unfortunately, we also have to discard this last wavelet, since the support of its filters is N = 2, and therefore it fails to satisfy the requirement 1. The only alternative is the use of a wavelet family whose filters exhibit almost linear phase, i.e., have almost symmetrical or anti-symmetrical impulse responses. Fig. 2, which gives an intuition of the impulse response shapes of such low-pass filters, shows that Symmlets or Coiflets filters are the most appropriate for satisfying this particular requirement. F.L. Sanchez et al. / Journal of Computational and Applied Mathematics 227 (2009) 288–293 291 Fig. 2. From left to right: impulse response shapes of the wavelet filters Haar, Daubechies, Vaidyanathan, Beylkin, Coiflet, and Symmlet. Fig. 3. [above]: Fourier-based cepstrum of /a/; [below]: time-aligned wavelet-based cepstrum of the same speech signal; [horizontal axis]: cepstrum signal samples; [vertical axis]: amplitude. The highest spike close to the end of the horizontal axis corresponds to the pitch. Lastly, requirement 3 can be satisfied by using Daubechies’ filters, since they have a maximally flat frequency response [6]. This family of filters, however, fail to satisfy requirement 2 because they do not have a linear phase response. Therefore, the only alternative is, again, to use a family of wavelets whose filters present an almost flat response in the pass and stop bands. Since Symmlets and Coiflets present this characteristic, and they also satisfy requirements 1 and 2, they can be used. 5. Tests and results For evaluation of the proposed algorithm, synthetic utterances were used, particularly the following vowels sounds: /a/ as in the word dogma, /e/ as in the word men, and /i/ as in the word ship, with the pitch frequencies being, respectively, 116.79 Hz, 125.00 Hz, and 207.79 Hz. The signals were sampled at 16 000 Hz, 16-bit, PCM. Figs. 3–5 show the cepstrums of the utterances above, obtained by the Fourier method, described in Section 2, and by the proposed method. Table 3 contains the pitch frequency for the synthetic /e/ obtained with the proposed method, varying the wavelet bases and filter lengths. Similar results, in terms of precision, were obtained with the vowels /a/ and /i/, according to each wavelet used. (2048) All the tests above used L = 2048 samples. With that, the maximum level of the DWPT is j = loglog (2) = 11, which gives a 16 000 frequency resolution s = 2211 ≈ 3.9 Hz after the first filtering step (S2) of the proposed algorithm. If L increases, j increases and s decreases, i.e., a better frequency resolution is obtained. On the other hand, if L decreases, j decreases and s increases, i.e., a worse frequency resolution is obtained. 292 F.L. Sanchez et al. / Journal of Computational and Applied Mathematics 227 (2009) 288–293 Fig. 4. [above]: Fourier-based cepstrum of /e/; [below]: time-aligned wavelet-based cepstrum of the same speech signal; [horizontal axis]: cepstrum signal samples; [vertical axis]: amplitude. The highest spike close to the end of the horizontal axis corresponds to the pitch. Fig. 5. [above]: Fourier-based cepstrum of /i/; [below]: time-aligned wavelet-based cepstrum of the same speech signal; [horizontal axis]: cepstrum signal samples; [vertical axis]: amplitude. The highest spike close to the middle of the horizontal axis corresponds to the pitch. Table 3 Results obtained for the synthetic vowel /e/, created with pitch period at 125.00 Hz W S P W S P W S P W S P Haar Daubechies Coiflet Symmlet 2 24 18 16 125.98 125.98 125.00 125.00 Daubechies Daubechies Coiflet Symmlet 4 36 24 24 125.98 125.98 125.00 125.98 Daubechies Coiflet Coiflet Beylkin 8 6 30 18 125.98 125.98 125.98 125.98 Daubechies Coiflet Symmlet Vaidyanathan 12 12 8 24 125.98 125.98 125.98 125.98 [W ]: wavelet family; [S]: filter support; [P]: pitch frequency, in hertz, obtained using the proposed algorithm. 6. Conclusions We presented a new wavelet-based algorithm for cepstrum calculation, which has lower computational complexity than the traditional method based on DFT or FFT, and can be used for pitch period extraction in real time. On the basis of our theoretical assumptions, confirmed in practice, we concluded that a family of wavelets whose filters exhibit almost linear phase responses, i.e., Symmlets or Coiflets, with a balanced value for N, i.e., 12 6 N 6 24, are the best candidates. This choice F.L. Sanchez et al. / Journal of Computational and Applied Mathematics 227 (2009) 288–293 293 takes into account the frequency selectivity and the time resolution obtained with the use of such filters. The length, L, of the input speech signal also influences the results, 2048 being a reasonable choice in terms of both precision and convenience for small delay in real time applications. Acknowledgment We wish to thank the State of São Paulo Research Foundation for the grants given for this work under process number 2005/00015-1. References [1] [2] [3] [4] [5] [6] L. Deng, O. O’Shaughnessy, Speech Processing: A Dynamic and Optimization-Oriented Approach, Marcel Dekker Inc., New York, USA, 2003. J. Coleman, Introducing Speech and Language Processing, Cambridge University Press, Cambridge, UK, 2005. W. Chou, B.H. Juang, Pattern Recognition in Speech and Language Processing, CRC Press, Boca Raton, USA, 2003. T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice Hall, Upper Saddle River, NJ, USA, 2002. A. Jensen, A. Cour-Harbo, Ripples in Mathematics: The Discrete Wavelet Transform, Springer, Berlin, Germany, 2001. P.S. Addison, The Illustrated Wavelet Transform Handbook: Introductory Theory and Applications in Science, Engineering, Medicine and Finance, Institute of Physics Publishing, Edinburg, 2002. [7] G. Strang, T. Nguyen, Wavelets and Filter Banks, Wellesley-Cambridge Press, Wellesley, MA, USA, 1997. [8] R.C. Guido, C.D. Maciel, M. Monteiro, E.S. Fonseca, S. Panchapagesan, J.C. Pereira, L.S. Vieira, S. Barbon Jr., F.L. Sanchez, M.B.A. Guilherme, K.I.C. Sergio, T.L. Scarpa, P.C. Fantinato, E.J.R. Moura, A study on the best wavelet for audio compression, in: 40th IEEE Asilomar Int. Conference on Signals, Systems and Computers, Pacific-Grove, CA, USA, vol. 1, 2006, pp. 2115–2118. [9] R.W. Hamming, Digital Filters, third ed., Dover Publications, New York, USA, 1998.

学位证补办新白金汉大学留信认证毕业证书印刷品《Q微信/1954292140》新白金汉大学学历学位证书如何办理英国教育部认证英国新白金汉大学毕业证购买新白金汉大学文凭办理制作高仿学历、《新白金汉大学毕业证书成绩单邮寄》、定制英国新白金汉大学成绩单样板英国新白金汉大学学位证书、购买英国文凭证书做英国新白金汉大学文凭毕业证成绩单、如何办理英国英国新白金汉大学毕业证书复印件如何购买英国英国新白金汉大学 Academic Transcript新白金汉大学Bachelor Diploma。留学生买新白金汉大学毕业证文凭、学历认证请联系【Q/微信1954 292 140】英国文凭证书设计如何购买新白金汉大学毕业证书复印件。专业为留学生办理新白金汉大学毕业证、成绩单、使馆留学回国人员证明、教育部学历学位认证、新白金汉大学录取通知书、Offer、在读证明、雅思托福成绩单、网上存档永久可！留学生归国服务中心面向留学生提供以下服务: 1：《如何办理新白金汉大学毕业证书复印件》【Q/微1954292140】《英国新白金汉大学成绩单哪里买》毕业证成绩单、留信网认证、使馆认证、教育部认证、雅思托福成绩单、学生卡。 2：真实可查留信网认证《英国文凭证书设计如何购买新白金汉大学毕业证书复印件》【Q/微1954292140】 (留学生入职 #提干 #定岗 #加薪 #升职 #评定等级是使用) 3：真实使馆认证(即留学回国人员证明,不成功不收费) 4：真实教育部学历学位认证《英国新白金汉大学成绩单新白金汉大学文凭证书设计哪里买》【Q/微1954292140】（教育部留学服务中心 #官网永久存档可查） 5：办理各国各大学文凭《英国新白金汉大学毕业证书复印件如何办理》【Q/微1954292140】《如何购买新白金汉大学教育部认证文凭证书设计》，从防伪到印刷，从水印到钢印烫金，与学校100%相同。提供整套申请学校材料可以提《英国新白金汉大学毕业证书复印件如何办理》【Q/微1954292140】《如何购买新白金汉大学教育部认证文凭证书设计》供钢印 #水印 #烫金 #激光防伪 #凹凸版 #最新版的毕业证 #百分之百让您绝对满意 #设计，印刷，DHL快递；《如何办理新白金汉大学毕业证书复印件》文凭学历证书办理流程《Q微1954292140》： 1、客户提供办理信息：姓名、生日、专业、学位、毕业时间等（如信息不确定可以咨询顾问：微信1954292140我们有专业老师帮你查询）； 2、客户付定金下单； 3、公司确认到账转制作点做电子图； 4、电子图做好发给客户确认； 5、电子图确认好转成品部做成品； 6、成品做好拍照或者视频确认再付余款； 7、快递给客户（国内顺丰，国外DHL）。【郑重声明：质量满意为止】专业办理留信网认证《英国文凭证书设计如何购买新白金汉大学毕业证书复印件》【Q/微1954292140】《哪里买新白金汉大学成绩单》、使馆及教育部认证100%可查永久存档！！！一次办理，终生有效，快速专业，诚信可靠。留学服务中心：实体公司，注册经营，行业标杆，精益求精！咨询顾问：QQ/微信：1954292140 为留学生提供以下服务： ★真实教育部认证，教育部存档，中国教育部留学服务中心认证（即教育部留服认证）网站100%可查； ★真实使馆认证（即留学人员回国证明），使馆存档可通过大使馆查询确认； ★留信网认证，国家专业人才认证中心颁发入库证书，留信网永久存档可查； ★《如何办理新白金汉大学毕业证书复印件》【Q/微1954292140】《英国新白金汉大学成绩单哪里买》毕业证、成绩单等全套材料，从防伪到印刷，从水印到钢印烫金，高精仿度跟学校原版100%相同；【真实可查】---【永久存档】---【安全可靠】---【值得信赖】留学挂科被学校开除了怎么办《如何办理新白金汉大学毕业证书复印件》【Q/微1954 292 140】《英国新白金汉大学成绩单哪里买》？与国内不一样的是，挂科之后《英国文凭证书设计如何购买新白金汉大学毕业证书复印件》【Q/微1954 292 140】《哪里买新白金汉大学成绩单》，孩子只能重修，没有办法补考。而且重修还要额外再交一次该门课程的钱。但是，即使重修了也没有办法把上一次挂科的成绩覆盖掉，而是分开来算。甚至有的学校会在学生重修通过之后，依然不更新成绩，显示挂科成绩，只是学分算过了而已。长期暴露在各种留学压力中，留学生出现挂科的情况越来越严重。平均毕业率为53.1%，这意味着有接近一半的学生无法顺利从大学毕业《英国新白金汉大学毕业证书复印件如何办理》【Q/微1954 292 140】《如何购买新白金汉大学文凭证书设计》。这些无法毕业的学生中，不乏中国留学生的身影。一些人在正式进入本科课程前的语言肯成阶段就已经倒下了，而随着进入专业课程的学习，学习难度增大，学习任务的压迫，很多留学生都没能终完成学业。

Log In

Wavelet-based cepstrum calculation