Linear Predictive Coding
Linear Predictive Coding
$he speech signal is %iltered to no more than one hal% the s&stem sampling %re'uenc& and then () conversion is per%ormed! $he speech is processed on a %rame *& %rame *asis +here the anal&sis %rame length can *e varia*le! For each %rame a pitch period estimation is made along +ith a voicing decision! ( linear predictive coe%%icient anal&sis is per%ormed to o*tain an inverse model o% the speech spectrum ( ,-.! /n addition a gain parameter 01 representing some %unction o% the speech energ& is computed! (n encoding procedure is then applied %or trans%orming the anal&-ed parameters into an e%%icient set o% transmission parameters +ith the goal o% minimi-ing the degradation in the s&nthesi-ed speech %or a speci%ied num*er o% *its! "no+ing the transmission %rame rate and the num*er o% *its used %or each transmission parameters1 one can compute a noise2%ree channel transmission *it rate! (t the receiver1 the transmitted parameters are decoded into 'uanti-ed versions o% the coei%%icent anal&sis and pitch estimation parameters! (n excitation signal %or s&nthesis is then constructed %rom the transmitted pitch and voicing parameters! $he excitation signal then drives a s&nthesis %ilter 3)( ,-. corresponding to the anal&sis model ( ,-.! $he digital samples s4,n. are then passed through an )( converter and lo+ pass %iltered to generate the s&nthetic speech s,t.! Either *e%ore or a%ter s&nthesis1 the gain is used to match the s&nthetic speech energ& to the actual speech energ&! $he digital samples are the converted to an analog signal and passed through a %ilter similar to the one at the input o% the s&stem!
Linear predictive coding (LPC) of speech $he linear predictive coding ,LPC. method %or speech anal&sis and s&nthesis is *ased on modeling the Vocal tract as a linear (ll2Pole ,//5. %ilter having the s&stem trans%er %unction6
7here p is the num*er o% poles1 0 is the %ilter 0ain1 and a8#9 are the parameters that determine the poles! $here are t+o mutuall& exclusive +a&s excitation %unctions to model voiced and unvoiced speech sounds! For a short time2*asis anal&sis1 voiced speech is considered periodic +ith a %undamental %re'uenc& o% Fo1 and a pitch period o% 3)Fo1 +hich depends on the spea#er! :ence1 Voiced speech is generated *& exciting the all pole %ilter model *& a periodic impulse train! On the other hand1 unvoiced sounds are generated *& exciting the all2pole %ilter *& the output o% a random noise generator!
$he %undamental di%%erence *et+een these t+o t&pes o% speech sounds comes %rom the +a& the& are produced! $he vi*rations o% the vocal cords produce voiced
sounds! $he rate at +hich the vocal cords vi*rate dictates the pitch o% the sound! On the other hand1 unvoiced sounds do not rel& on the vi*ration o% the vocal cords! $he unvoiced sounds are created *& the constriction o% the vocal tract! $he vocal cords remain open and the constrictions o% the vocal tract %orce air out to produce the unvoiced sounds
0iven a short segment o% a speech signal1 lets sa& a*out 20 ms or 3;0 samples at a sampling rate < ":-1 the speech encoder at the transmitter must determine the proper excitation %unction1 the pitch period %or voiced speech1 the gain1 and the coe%%icients
the encoder)decoder %or the Linear Predictive Coding! $he parameters o% the model are determined adaptivel& %rom the data and modeled into a *inar& se'uence and transmitted to the receiver! (t the receiver point1 the speech signal is the s&nthesi-ed %rom the model and excitation signal!
$he parameters o% the all2pole %ilter model are determined %rom the speech samples *& means o% linear prediction! $o *e speci%ic the output o% the Linear Prediction %ilter is
s , n. = a p , k . s , n k .
k= 3
and the corresponding error *et+een the o*served sample S,n. and the predicted value
s,n. is
e, n . = s , n . s , n .
4
*& minimi-ing the sum o% the s'uared error +e can determine the pole parameters a p , k . o% the model! $he result o% di%%erentiating the sum a*ove +ith respect to each o% the parameters and e'uation the result to -ero1 is a sep o% p linear e'uations
a
k =3
rss , m . = s ,n. s , n + m.
n =0
+here Rss a is a pxp autocorrelation matrix1 rss is a px3 autocorrelation vector1 and a is a px3 vector o% model parameters!
[row col] = size(data); if col==1 data=data'; end nframe = 0; msfr = round(sr/1000 fr); ! "on#ert ms to samples msfs = round(sr/1000 fs); ! "on#ert ms to samples duration = len$t%(data); speec% = filter([1 &preemp]' 1' data)'; ! (reemp%asize speec% mso#erlap = msfs & msfr; ramp = [0)1/(mso#erlap&1))1]'; ! "ompute part of window for frame*nde+=1)msfr)duration&msfs,1 ! frame rate=-0ms frame.ata = speec%(frame*nde+)(frame*nde+,msfs&1)); ! frame size=/0ms nframe = nframe,1; auto"or = +corr(frame.ata); ! "ompute t%e cross correlation auto"or0ec = auto"or(msfs,[0)1]);
$hese e'uations can *e solved in ?($L@ *& using the Levinson2 ur*in algorithm!
! 1e#inson's met%od err(1) = auto"or0ec(1); k(1) = 0; 2 = []; for inde+=1)1 numerator = [1 23'] auto"or0ec(inde+,1)&1)-); denominator = &1 err(inde+); k(inde+) = numerator/denominator; ! (2R"4R coeffs 2 = [2,k(inde+) flipud(2); k(inde+)]; err(inde+,1) = (1&k(inde+)5-) err(inde+);
$he gain parameter o% the %ilter can *e o*tained *& the input2output relationship as %ollo+
s , n. = a p ,k . s , n k . + 6+,n.
k =3 p
+here A,n. represent the input se'uence! 7e can %urther manipulate this e'uation and in terms o% the error se'uence +e have
6+,n. = s ,n. + a p ,k . s, n k . = e,n.
k =3 p
then 6 2 + 2 , n. = e 2 ,n.
n =0 n =0 N 3 N 3
+here 042 is set e'ual to the residual energ& resulting %rom the least s'uare optimi-ation !
! filter response if 0 $ain=0; cft=0)(1/-77))1; for inde+=1)1 $ain = $ain , a"oeff(inde+'nframe) e+p(&i - pi cft)35inde+; end $ain = a8s(13/$ain); spec()'nframe) = -0 lo$10($ain(1)1-9))'; plot(-0 lo$10($ain)); title(nframe); drawnow; end if 0 impulseResponse = filter(1' a"oeff()'nframe)' [1 zeros(1'-77)]); fre:Resp = -0 lo$10(a8s(fft(impulseResponse))); plot(fre:Resp); end
once the LPC coe%%icients are computed1 +e can determine +eather the input speech %rame is voiced1 and i% it is indeed voiced sound1 then +hat is the pitch! 7e can determine the pitch *& computing the %ollo+ing se'uence in matla*6
re , n. = ra , k .rss ,n k .
k =3
+hich is de%ined as the autocorrelation se'uence o% the prediction coe%%icients! $he pitch id detected *& %inding the pea# o% the normali-ed se'uence
re , n. /n the time interval corresponds to B to 35 ms in the 20ms sampling %rame! /% re ,0. the value o% this pea# is at least 0!251 the %rame o% speech is considered voiced +ith a
pitch period e'ual to the value o% n = N p 1 +here
re , N p . re ,0.
is a maximum value!
/% the pea# value is less than 0!251 the %rame speech is considered unvoiced and the pitch +ould e'ual to -ero! errSig = %ilter,83 (C9131%rame ata.D E %ind excitation noise 0,n%rame. = s'rt,err,LF3..D E gain autoCorErr = xcorr,errSig.D E calculate pitch G voicing in%ormation 8@1/9 = sort,autoCorErr.D num = length,/.D i% @,num23. H !03I@,num. pitch,n%rame. = a*s,/,num. 2 /,num23..D else pitch,n%rame. = 0D end
$he value o% the LPC coe%%icients1 the pitch period1 and the t&pe o% excitation are then transmitted to the receiver! $he decoder s&nthesi-es the speech signal *& passing the proper excitation through the all pole %ilter model o% the vocal tract!
$&picall& the pitch period re'uires ; *its1 the gain parameters are represented in 5 *its a%ter the d&namic range is compressed logrithmatical&1 and the prediction coe%%icients re'uire <230 *its normall& %or accurac& reasons! $his is ver& important in LPC *ecause an& small changes in the prediction coe%%icients result in large change in the pole positions o% the %ilter model1 +hich cause insta*ilit& in the model! $his is overcome *& using the P(5(CO5 method !
Once the LPC coe%%icients are competed1 +e can determine +eather the input speech %rame is voiced1 and i% so1 +hat the pitch is! /% the speech %rame is decided to *e voiced1 an impulse train is emplo&ed to represent it1 +ith non-ero taps occurring ever& pitch period! ( pitch2detecting algorithm is used in order to determine to correct pitch period ) %re'uenc&! $he autocorrelation %unction is used to estimate the pitch period as ! :o+ever1 i% the %rame is unvoiced1 then +hite noise is used to represent it and a pitch period o% $=0 is transmitted! $here%ore1 either +hite noise or impulse train *ecomes the excitation o% the LPC s&nthesis %ilter
Two types of LPC vocoders were implemented in MATLAB Plain LPC Vocoder diagram is shown below :
ELPC vocoder
function [ outspeec% ] = speec%coder1( inspeec% ) ; ! (arameters) ! inspeec% ) wa#e data wit% samplin$ rate <s ! (<s can 8e c%an$ed underneat% if necessar=) ! Returns) ! outspeec% ) wa#e data wit% samplin$ rate <s ! (coded and res=nt%esized) if ( nar$in >= 1) error('ar$ument c%eck failed'); end; <s = 1?000; ! samplin$ rate in @ertz (@z) 4rder = 10; ! order of t%e model used 8= 1(" ! encoded t%e speec% usin$ 1(" [a"oeff' resid' pitc%' 6' parcor' stream] = proclpc(inspeec%' <s' 4rder); ! decode/s=nt%esize speec% usin$ 1(" and impulse&trains as e+citation outspeec% = s=nlpc(a"oeff' pitc%' <s' 6)
results )
residual plot 6
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
50
100
150
200
250
300
350
400
450
500
1.2
0.8
0.6
0.4
0.2
50
100
150
0.3
0.2
0.1
-0.1
-0.2
-0.3
0.5
1.5
2.5
3.5
4.5 x 10
5
4
-2
0.5
1.5
2.5
3.5
4.5 x 10
5
4
voice excited LPC Vocoder ,utili-ing C$ %or high compression rate)lo+ *its. the input speech signal in each %rame is %iltered +ith the estimated trans%er %unction o% LPC anal&-er! $his %iltered signal is called the residual!
$o achieve a high compression rate 1the discrete cosine trans%orm , C$. o% the residual signal could *e emplo&ed! $he C$ concentrates most o% the energ& o% the signal in the
%irst %e+ coe%%icients! $hus one +a& to compress the signal is to trans%er onl& the coe%%icients1 +hich contain most o% the energ&! %unction 8 outspeech 9 = speechcoder2, inspeech . E Parameters6 E inspeech 6 +ave data +ith sampling rate Fs E ,Fs can *e changed underneath i% necessar&. E 5eturns6 E outspeech 6 +ave data +ith sampling rate Fs E ,coded and res&nthesi-ed. i% , nargin J= 3. error,Cargument chec# %ailedC.D
endD Fs = 3;000D E sampling rate in :ert- ,:-. Order = 30D E order o% the model used *& LPC E encoded the speech using LPC 8aCoe%%1 resid1 pitch1 01 parcor1 stream9 = proclpc,inspeech1 Fs1 Order.D E per%orm a discrete cosine trans%orm on the residual resid = dct,resid.D 8a1*9 = si-e,resid.D E onl& use the %irst 50 C$2coe%%icients this can *e done E *ecause most o% the energ& o% the signal is conserved in these coe%%s resid = 8 resid,365016.D -eros,KB01*. 9D
E 'uanti-e the data resid = uencode,resid1K.D resid = udecode,resid1K.D E per%orm an inverse C$ resid = idct,resid.D E add some noise to the signal to ma#e it sound *etter noise = 8 -eros,501*.D 0!03Irandn,KB01*. 9D resid = resid F noiseD E decode)s&nthesi-e speech using LPC and the compressed residual as excitation outspeech = s&nlpc2,aCoe%%1 resid1 Fs1 0.D
res lts
noise added to the signal to make it sound better 0.05 0.04 0.03 0.02 0.01 0 -0.01 -0.02 -0.03 -0.04 -0.05
50
100
150
200
250
300
350
400
450
500
0.1
-0.1
-0.2
-0.3
-0.4
-0.5
50
100
150
200
250
300
350
400
450
500
0.3
0.2
0.1
-0.1
-0.2
-0.3
0.5
1.5
2.5
3.5
4.5 x 10
5
4
0.3
0.2
0.1
-0.1
-0.2
-0.3
0.5
1.5
2.5
3.5
4.5 x 10
5
4
?($L(@ %iles 6 clear allD Eosama saraireh E speech processing E r! Veton "epus#a EF/$ F(ll 2005 a= input ,Cplease load the speech signal as a !+av %ile C 1 CsC.D /nputsound%ile = a D 8inspeech1 Fs1 *its9 = +avread,/nputsound%ile.D E read the +ave%ile outspeech3 = speechcoder3,inspeech.D E plain LPC vocoder outspeech2 = speechcoder2,inspeech.D E Voice excitded LPC vocoder E plot results %igure,3.D
su*plot,B1313.D plot,inspeech.D gridD su*plot,B1312.D plot,outspeech3.D gridD su*plot,B131B.D plot,outspeech2.D gridD disp,CPress an& #e& to pla& the original sound %ileC.D pauseD soundsc,inspeech1 Fs.D disp,CPress an& #e& to pla& the LPC compressed %ileLC.D pauseD soundsc,outspeech31 Fs.D disp,CPress a #e& to pla& the voice2excited LPC compressed soundLC.D pauseD soundsc,outspeech21 Fs.D
%unction 8aCoe%%1resid1pitch101parcor1stream9 = proclpc,data1sr1L1%r1%s1preemp. E L 2 $he order o% the anal&sis! ! E %r 2 Frame time increment1 in ms! e%aults to 20ms E %s 2 Frame si-e in ms! E preemp 2 de%ault 0!MBN< E aCoe%% 2 $he LPC anal&sis results1 E resid 2 $he LPC residual1 E pitch 2 calculated *& %inding the pea# in the residualCs autocorrelation E%or each %rame! E 0 2 $he LPC gain %or each %rame! E parcor 2 $he parcor coe%%icients! E stream 2 $he LPC anal&sisC residual or excitation signal as one long vector!
i% ,narginOB.1 L = 30D end i% ,narginOK.1 %r = 20D end i% ,narginO5.1 %s = B0D end i% ,narginO;.1 preemp = !MBN<D end 8ro+ col9 = si-e,data.D i% col==3 data=dataCD end n%rame = 0D ms%r = round,sr)3000I%r.D E Convert ms to samples ms%s = round,sr)3000I%s.D E Convert ms to samples duration = length,data.D speech = %ilter,83 2preemp91 31 data.CD E Preemphasi-e speech msoverlap = ms%s 2 ms%rD ramp = 8063),msoverlap23.639CD E Compute part o% +indo+ %or %rame/ndex=36ms%r6duration2ms%sF3 E %rame rate=20ms %rame ata = speech,%rame/ndex6,%rame/ndexFms%s23..D E %rame si-e=B0ms n%rame = n%rameF3D autoCor = xcorr,%rame ata.D E Compute the cross correlation autoCorVec = autoCor,ms%sF806L9.D E LevinsonCs method err,3. = autoCorVec,3.D #,3. = 0D ( = 89D %or index=36L numerator = 83 (!C9IautoCorVec,indexF362362.D denominator = 23Ierr,index.D #,index. = numerator)denominatorD E P(5CO5 coe%%s ( = 8(F#,index.I%lipud,(.D #,index.9D err,indexF3. = ,32#,index.42.Ierr,index.D end aCoe%%,61n%rame. = 83D (9D parcor,61n%rame. = #CD E %ilter response i% 0 gain=0D c%t=06,3)255.63D %or index=36L gain = gain F aCoe%%,index1n%rame.Iexp,2iI2IpiIc%t.!4indexD end gain = a*s,3!)gain.D spec,61n%rame. = 20Ilog30,gain,3632<..CD plot,20Ilog30,gain..D title,n%rame.D dra+no+D end E Calculate the %ilter response E %rom the %ilterCs impulse E response ,to chec# a*ove.! i% 0
impulse5esponse = %ilter,31 aCoe%%,61n%rame.1 83 -eros,31255.9.D %re'5esponse = 20Ilog30,a*s,%%t,impulse5esponse...D plot,%re'5esponse.D end errSig = %ilter,83 (C9131%rame ata.D E %ind excitation noise 0,n%rame. = s'rt,err,LF3..D E gain autoCorErr = xcorr,errSig.D E calculate pitch G voicing in%ormation 8@1/9 = sort,autoCorErr.D num = length,/.D i% @,num23. H !03I@,num. pitch,n%rame. = a*s,/,num. 2 /,num23..D else pitch,n%rame. = 0D end E improve the compressed sound 'ualit& resid,61n%rame. = errSig)0,n%rame.D i%,%rame/ndex==3. E add residual %rames using a trape-oidal +indo+ stream = resid,36ms%r1n%rame.D else stream = 8stream9D overlapFresid,36msoverlap1n%rame.!IrampD resid,msoverlapF36ms%r1n%rame.D end i%,%rame/ndexFms%rFms%s23 H duration. stream = 8streamD resid,ms%rF36ms%s1n%rame.9D else overlap = resid,ms%rF36ms%s1n%rame.!I%lipud,ramp.D end end stream = %ilter,31 83 2preemp91 stream.CD
Speech ?odel one LPC Vocoder 6 %unction 8 outspeech 9 = speechcoder3, inspeech . E Parameters6 E inspeech 6 +ave data +ith sampling rate Fs E outputs6
E outspeech 6 +ave data +ith sampling rate Fs E ,coded and res&nthesi-ed. i% , nargin J= 3. error,Cargument chec# %ailedC.D endD Fs = <000D E sampling rate in :ert- ,:-. Order = 30D E order o% the model used *& LPC E encoded the speech using LPC 8aCoe%%1 resid1 pitch1 01 parcor1 stream9 = proclpc,inspeech1 Fs1 Order.D E decode)s&nthesi-e speech using LPC and impulse2trains as excitation outspeech = s&nlpc,aCoe%%1 pitch1 Fs1 0.D
%unction 8 outspeech 9 = speechcoder2, inspeech . E Parameters6 E inspeech 6 +ave data +ith sampling rate Fs E ,Fs can *e changed underneath i% necessar&. E output6 E outspeech 6 +ave data +ith sampling rate Fs E ,coded and res&nthesi-ed.
i% , nargin J= 3. error,Cargument chec# %ailedC.D endD Fs = 3;000D E sampling rate in :ert- ,:-. Order = 30D E order o% the model used *& LPC E encoded the speech using LPC 8aCoe%%1 resid1 pitch1 01 parcor1 stream9 = proclpc,inspeech1 Fs1 Order.D E per%orm a discrete cosine trans%orm on the residual resid = dct,resid.D 8a1*9 = si-e,resid.D E onl& use the %irst 50 C$2coe%%icients this can *e done E *ecause most o% the energ& o% the signal is conserved in these coe%%s resid = 8 resid,365016.D -eros,KB01*. 9D
E 'uanti-e the data resid = uencode,resid1K.D resid = udecode,resid1K.D E per%orm an inverse C$ resid = idct,resid.D E add some noise to the signal to ma#e it sound *etter noise = 8 -eros,501*.D 0!03Irandn,KB01*. 9D resid = resid F noiseD E decode)s&nthesi-e speech using LPC and the compressed residual as excitation outspeech = s&nlpc2,aCoe%%1 resid1 Fs1 0.
5e%erences Linear Prediction o% Speech1 P! ?(5"EL1 (!: 05(Q1 Pr! Pages 302M;1 3M0235< igital signal Processing1 (lan V! Oppenheim) 5onald 7! Scha%er igital signal processing using ?($L(@1 Vina& "! /ngle1 Pohn Proa#id http6))+++!data2compression!com)speech!html