Speech Compression (2)
Speech Compression (2)
Speech Compression (2)
Air is pushed from your lung through your vocal tract and out of your mouth
comes speech.
For certain voiced sound, your vocal cords vibrate (open and close). The rate at
which the vocal cords vibrate determines the pitch of your voice.
Women and young children tend to have high pitch (fast vibration) while adult
males tend to have low pitch (slow vibration).
For certain fricatives and plosive (or unvoiced) sound, your vocal cords do not
vibrate but remain constantly opened.
The shape of your vocal tract determines the sound that you make.
As you speak, your vocal tract changes its shape producing different sound.
The shape of the vocal tract changes relatively slowly (on the scale of 10 msec
to 100 msec).
The amount of air coming from your lung determines the loudness of your voice
VOWELS
Diphthongs
SEMIVOWELS
Is the sound that intermediate between a vowel and
consonant
It is quite difficult to characterize.They are having the
vowel like nature
Example : /w/, /I/, /r/ and /y/
CONSONANT
A nasal consonant is a type of consonant produced with a
lowered velum in the
mouth, allowing air to come out through the nose, while the air
is not allowed to
pass through the mouth
The air is flows through the nasal tract with sound being
radiated at the nostrils
The three nasal consonant are distinguished by the place
along the oral tract at
which a total constriction is made.
o For /m/ the constriction is at the lips
o For /n/ the constriction is behind the teeth
o For /Ƞ / constriction is just forward the velum itself
PCM
• Sample Rate
– Nyquist Critria
– Bandwidth Limitation
• Sample Size
– Quantization Levels
– Quantization Noise / Distortion
Compression
• Lossey • Source Independent
• Lossless • Source Dependent
Voice Coding
(Compression)
• Waveform
• Frequency Domain
• Vocoder
Waveform Coding
• DSI - Digital Speech Interpolation
Law / A Law
• Differential PCM
– ADPCM – Adaptive Differential PCM
• Delta Modulation
– CVSD – Continuously Variable Slope Detector
Frequency Domain
• SBC – Sub Band Coding
Vocoder
• LPC – Linear Predictive Coding
– CELP – Code Excited Linear Prediction
– VSELP – Vector Sum Excited Linear
Prediction
At the transmitter, the speech is divided into segments. Each segment is
analyzed to determine an excitation signal and the parameters of the
vocal tract filter.
The excitation signal is then synthesized at the receiver and used to drive
the vocal tract filter. In other schemes,
the excitation signal itself is obtained using an analysis-by-synthesis
approach.
This signal is then used by the vocal tract filter to generate the speech
signal.
each segment of input speech is analyzed using a bank of band-pass
filters called the analysis filters. The energy at the output of each filter
is estimated at fixed intervals and transmitted to the receiver.
Phonems
10 LPC coeficients
Bits per coefficient 16 bit (we can give)
10*16=160
where G is called the gain of the filter. As in the case of the channel vocoder,
the input to the vocal tract filter is either the output of a random noise
generator or a periodic pulse generator.
At the transmitter, a segment of speech is analyzed. The parameters obtained
include a decision as to whether the segment of speech is voiced or unvoiced,
the pitch period if the segment is declared voiced, and the parameters of the
vocal tract filter.
The input speech is generally sampled at 8000 samples per second. In the
LPC-10 standard, the speech is broken into 180 sample segments,
corresponding to 22.5 milliseconds of speech per segment.
The Voiced/Unvoiced Decision
the samples of the voiced speech have larger amplitude; that is,
there is more energy in the voiced speech.
The unvoiced speech contains higher frequencies.
Both speech segments have average values close to zero, this means that the unvoiced speech
waveform crosses the x = 0 line more often than the voiced speech sample.
The speech is voiced or unvoiced based on the energy in the segment relative to background
noise and the number of zero crossings within a specified window.
In the LPC-10 algorithm, the speech segment is first low-pass filtered using a filter with a
bandwidth of 1 kHz. The energy at the output relative to the background noise is used to obtain
a tentative decision about whether the signal in the segment should be declared voiced or
unvoiced.
The estimate of the background noise is basically the energy in the unvoiced speech segments.
This tentative decision is further refined by counting the number of zero crossings and
checking the magnitude of the coefficients of the vocal tract filter.
E s t i m a t i n g the Pitch Period