Speech Coding Techniques
Speech Coding Techniques
4/7/2003
Introduction
Voice Quality
Excellent 5
Good 4
Fair 3
Poor 2
Bad 1
A minimum of 30 people
Listen to voice samples or in conversations
P.800 recommendations
Toll quality
About Speech
Speech
Speech sounds
Voiced sound
Unvoiced sounds
Plosive sounds
Voice Sampling
x[k ] [n k ]
x[k ]h[n k ]
y[n]
k
1
h[n]
n
0 1 2
2
0.5
0 1
x[n]
2.5
0.5
0 1 2 3
y[n]
n
s (t )
(t nT )
n
N
xs (t ) xc (t ) s (t )
0 X c ( j )
xc (t ) (t nT )
n
2
S ( j )
T
S
S
( S N )
( k )
Quantization (Scalar
Quantization)
v1
m0= -A
vk+1
v2
m1
m2
mk
mk+1
vL
mL1
mL=A
k+1
Assume | x[n] | A
divide the range [ A , A ] into L quantization levels
{ J1 , J2 , Jk ,.. JL }
Jk : [mk-1,mk ]
R
L=2
S = U Jk , V = { v1 , v2 , vk ,.. vL }
Non-Uniform Quantization
m0 = -A
m1
m2
mL=A
Companding
x[n]
F(x)
Uniform
Quantization
Uniform
Decoder
F1(x)
Compressor
11011101
Expandor
Compressor + Expandor Compandor
F(x) is to specify the non-uniform
quantization characteristics
^
x[n]
Non-Uniform Quantization
-law
F ( x)
log 1 x
A-law
log( 1 )
F ( x)
,0 x 1
Ax
1
,0 x
1 lnA
A
1 ln[ A x ] 1
,
x 1
1 lnA
A
random
sequence
generator
periodic
pulse
train
generator
G
v/u
voiced
u[n]
Excitation parameters
G(z) =
P
1 akz-k
k=1
Excitation
A good approximation,
though not precise
enough
LPC
Analysis
{ ak }
N,G
v/u
Encoder
11011
N by pitch detection
v/u by voicing
detection
receiver
Decoder
11011
{ ak }
N,G
v/u
Ex
g[n]
G(z)
x[n]
G.711
North America
A-law
ADPCM(adaptive
differential PCM)
16,24,32,40Kbps
MOS 4.0 , at 32Kbps
Analysis-by-Synthesis
(AbS)
Codecs
Hybrid codec
G.728 LD-CELP
CELP codecs
Backward-adaptive coder
Use previous samples to determine filter coefficients
Operates on five samples at a time
Delay < 1 ms
Only the pointer is transmitted
LD-CELP encoder
LD-CELP decoder
G.723.1 ACELP
Both mandatory
Can change from one to another during a conversation
The coder
G.723.1 Annex A
6.3kbps 24 octets/frame
5.3kbps 20
SID frame 4
G.729
8 kbps
Input frames of 10 ms, 80 samples for 8
KHz sampling rate
5 ms look-ahead
Algorithmic delay of 15 ms
G.729.B
Other Codecs
Variable-rate coder
Two most common rates
Silence suppression
For use with RTP, RFC 2658
GSM 06.60
An enhanced version of GSM Full-Rate
ACELP-based codec
The same bit rate and the same
overall packing structure
12.2 kbps
GSM 06.90
Eight different modes
4.75 kbps to 12.2 kbps
12.2 kbps, GSM EFR
7.4 kbps, IS-641 (TDMA cellular
systems)
Change the mode at any time
Offer discontinuous transmission
Processing Power
Cascaded Codecs
human speech
G.711 is OK
G.723.1 and G.729 can be unintelligible
The ingress gateway needs to intercept
An Internet Draft
Both methods described before
A large number of tones and events
Payload format
Finis
x[k ] [n k ]
y[n]
x[k ]h[n k ]
k
1
h[n]
n
0 1 2
2
0.5
0 1
x[n]
2.5
0.5
0 1 2 3
y[n]
n
Frequency-Domain
Representation of
Sampling
X c ( j)
s (t )
(t nT )
n
N
xs (t ) xc (t ) s (t )
0 X c ( j)
xc (t ) (t nT )
n
2
S ( j )
T
S
S
( S N )
( k )
k 1
G( z)
1
p
1 ak z k
k 1
X ( z)
U ( z)