LPC Modeling: Unit 5 1.speech Compression
LPC Modeling: Unit 5 1.speech Compression
LPC Modeling: Unit 5 1.speech Compression
1.SPEECH COMPRESSION
Speech compression involves the compression of audio data in the form of speech. Speech is a
unique form of audio data. A lot of factors must be considered during compression to ensure that
it will be intelligible and reasonably pleasant to listen to.
The aim of speech compression is to produce a compact representation of speech sounds such
that when reconstructed it is perceived to be close to the original. The two main measures of
closeness are intelligibility and naturalness.
Need for compression:
Raw audio data can take up a great deal of memory. During compression, the data is compressed
so that it will occupy less space. This frees up room in storage, and it also becomes important
when data is being transmitted over a network. On a mobile phone network, for example, if
speech compression is used, more users can be accommodated at a given time because less
bandwidth is needed. Likewise, speech compression becomes important with teleconferencing
and other applications; sending data is expensive, and anything which reduces the volume of data
which needs to be sent can help to cut costs.
TYPES:
1.The -law algorithm (often u-law, ulaw, mu-law, pronounced /) is a companding algorithm,
primarily used in the digital telecommunication systems of North America and Japan.
Companding algorithms reduce the dynamic range of an audio signal. In analog systems, this can
increase the signal-to-noise ratio (SNR) achieved during transmission, and in the digital domain,
it can reduce the quantization error (hence increasing signal to quantization noise ratio). These
SNR increases can be traded instead for reduced bandwidth for equivalent SNR.ulaw coding
does not exploit the (normally large) sample to sample correlations found in speech.
2. ADPCM is the next family of speech coding techniques, and it exploits the redundancy by
using a simple linear filter to predict the next sample of speech. The resulting prediction error is
typically quantised to 4 bits thus giving a bit rate of 32 kbps.Advantages of ADPCM are that is
simple to implement and has very low delay.
3. LPC: To obtain more compression specific properties of the speech signal must be modelled.
The main assumption that a source (voicing or fricative excitation) is passed through a filter (the
vocal tract response) to produce the speech. The simplest implementation of this is known as a
LPC synthesiser (e.g. LPC10e). At every frame, the speech is analysed to compute the filter
coefficients, the energy of the excitation, a voicing decision, and a pitch value if voiced. At the
decoder a regular set of pulses for voiced speech or white noise for unvoiced speech is passed
through the linear filter and multiplied by the gain to produce the speech. This is a very efficient
system and typically produces speech coded at 1200-2400bps. With clever acoustic vector
prediction this can be reduced to 300-600bps. The disadvantages are a loss of naturalness over
most of the speech and occasionally a loss of intelligibility.
4. CELP: Code-Excited Linear Prediction- The CELP family of coders compensates for the lack
of quality of the simple LPC model by using more information in the excitation. Each of a set of
codebook of excitation vectors is tried and the index of the one that best matches the original
speech is transmitted. This results in an increase in the bit rate to typically 4800-9600bps. Most
speech coding research is currently directed towards CELP coders.
LPC Modeling
Digital speech signals are sampled at a rate of 8000 samples/sec. Typically, each sample
is represented by 8 bits (using mu-law). This corresponds to an uncompressed rate of 64 kbps
(kbits/sec). With current compression techniques (all of which are lossy), it is possible to reduce
the rate to 8 kbps with almost no perceptible loss in quality. Further compression is possible at a
cost of lower quality. All of the current low-rate speech coders are based on the principle of
linear predictive coding (LPC) which is presented in the following sections.
A. Physical Model:
B. Mathematical Model:
The above model is often called the LPC Model.
The model says that the digital speech signal is the output of a digital filter (called the
LPC filter) whose input is either a train of impulses or a white noise sequence.
The relationship between the physical and the mathematical models:
The LPC filter is given by:
which is equivalent to saying that the input-output relationship of the filter is given by the
linear difference equation:
Vocal Tract
(LPC Filter)
Air
(Innovations)
(voiced)
(pitch period)
(unvoiced)
Air Volume
(gain)
, generate
2.Adaptive filter
An adaptive filter is a filter that self-adjusts its transfer function according to an optimizing
algorithm. Because of the complexity of the optimizing algorithms, most adaptive filters are
digital filters that perform digital signal processing and adapt their performance based on the
input signal.
Need:When some parameters of the desired processing operation (for instance, the properties of
some noise signal) are not known in advance,adaptive filters are used. An adaptive filter, uses
feedback to refine the values of the filter coefficients and hence its frequency response.
As the power of digital signal processors has increased, adaptive filters have become much more
common and are now routinely used in devices such as mobile phones and other communication
devices, camcorders and digital cameras, and medical monitoring equipment.
Example
Suppose a hospital is recording a heart beat (an ECG), which is being corrupted by a 50 Hz noise
(the frequency coming from the power supply in many countries).
One way to remove the noise is to filter the signal with a notch filter at 50 Hz. However, due to
slight variations in the power supply to the hospital, the exact frequency of the power supply
might (hypothetically) wander between 47 Hz and 53 Hz. A static filter would need to remove all
the frequencies between 47 and 53 Hz, which could excessively degrade the quality of the ECG
since the heart beat would also likely have frequency components in the rejected range.
To circumvent this potential loss of information, an adaptive filter could be used. The adaptive
filter would take input both from the patient and from the power supply directly and would thus
be able to track the actual frequency of the noise as it fluctuates. Such an adaptive technique
generally allows for a filter with a smaller rejection range, which means, in our case, that the
quality of the output signal is more accurate for medical diagnoses.
Block diagram
The block diagram, shown in the following figure, serves as a foundation for particular adaptive
filter realisations, such as Least Mean Squares (LMS) and Recursive Least Squares (RLS). The
idea behind the block diagram is that a variable filter extracts an estimate of the desired signal.
To start the discussion of the block diagram we take the following assumptions:
The input signal is the sum of a desired signal d(n) and interfering noise v(n)
x(n) = d(n) + v(n)
The variable filter has a Finite Impulse Response (FIR) structure. For such structures the
impulse response is equal to the filter coefficients. The coefficients for a filter of order p
are defined as
.
The error signal or cost function is the difference between the desired and the estimated
signal
The variable filter estimates the desired signal by convolving the input signal with the impulse
response. In vector notation this is expressed as
where
is an input signal vector. Moreover, the variable filter updates the filter coefficients at every time
instant
where
is a correction factor for the filter coefficients. The adaptive algorithm generates this
correction factor based on the input and error signals. LMS and RLS define two different
coefficient update algorithms.
Applications of adaptive filters
Noise cancellation
Signal prediction
Adaptive feedback cancellation
Echo cancellation
Filter implementations
Least mean squares filter
Recursive least squares filter
1.Noise cancellation
Active noise control (ANC) (also known as noise cancellation, active noise reduction (ANR)
or antinoise) is a method for reducing unwanted sound A noise-cancellation speaker emits a
sound wave with the same amplitude but with inverted phase (also known as antiphase) to the
original sound., and effectively cancel each other .
A noise-cancellation speaker may be co-located with the sound source to be attenuated. In this
case it must have the same audio power level as the source of the unwanted sound. Alternatively,
the transducer emitting the cancellation signal may be located at the location where sound
attenuation is wanted (e.g. the user's ear). This requires a much lower power level for
cancellation but is effective only for a single user. Noise cancellation at other locations is more
difficult .
The advantages of active noise control methods compared to passive ones are that they are
generally:
More effective at low frequencies.
Less bulky.
Able to block noise selectively.
2.Linear prediction is a mathematical operation where future values of a discrete-time signal
are estimated as a linear function of previous samples. In digital signal processing, linear
prediction is often called linear predictive coding (LPC
The prediction model
The most common representation is
where
is the predicted signal value, x(n i) the previous observed values, and ai the
predictor coefficients. The error generated by this estimate is
where x(n) is the true signal value.
These equations are valid for all types of (one-dimensional) linear prediction. The differences are
found in the way the parameters ai are chosen.
For multi-dimensional signals the error metric is often defined as
where
Echo cancellation is done using either echo suppressors or echo cancellers, or in some cases
both.
The Acoustic Echo Cancellation (AEC) process works as follows:
1. A far-end signal is delivered to the system.
2. The far-end signal is reproduced by the speaker in the room.
3. A microphone also in the room picks up the resulting direct path sound, and consequent
reverberant sound as a near-end signal.
4. The far-end signal is filtered and delayed to resemble the near-end signal.
5. The filtered far-end signal is subtracted from the near-end signal.
6. The resultant signal represents sounds present in the room excluding any direct or
reverberated sound produced by the speaker.
thresholds are based on the critical bands of hearing. If the quantisation noise energy can be kept
below the masking threshold then the compressed signal would have the same transparent
perceptual audio quality as the original signal.
Quantisation and coding processes aim to distribute the available bits among the DCT
coefficients such that the quantisation noise remains masked. This is achieved through an
iterative two-stage optimization loop. A power law quantiser is used so that large spectral values
are coded with a larger quantization step size, as a higher signal energy masks more quantisation
noise. The quantised values are then Huffman coded. To adapt the coder to the local statistics of
the input audio signal the best Huffman coding table is selected from a number of choices .
The Huffman coder is a probabilistic coding method that achieves coding efficiency through
assigning shorter length codewords to more probable (i.e. more frequent) signal values and
longer length codewords to less frequent values. Consequently, for audio signals smaller
quantised values, which are more frequent, are assigned shorter length codewords and larger
values, which are less frequent, are assigned longer length codewords.
Quantisation consists of two loops an inner loop that adjusts the rate to keep the overall bit rate
within the required limit and an outer loop that aims to keep the distortion in each critical band
masked.
4.Image enhancement
The aim of image enhancement is to improve the interpretability or perception of information in
images for human viewers, or to provide `better' input for other automated image processing
techniques.
Image enhancement techniques can be divided into two broad categories:
1. Spatial domain methods, which operate directly on pixels, and
2. frequency domain methods, which operate on the Fourier transform of an image.
4.1Spatial domain methods
Suppose we have a digital image which can be represented by a two dimensional random field
f(x,y).
An image processing operator in the spatial domain may be expressed as a mathematical function
T[ ] applied to the image f(x,y).to produce a new image g(x,y=T[f(x,y)] as follows.
g ( x, y ) T f ( x , y )
A single pixel(x,y). In this case T is a grey level transformation (or mapping) function.
(ii)
(iii)
s T (r )
L 1
L 1
Contrast Stretching
Low contrast images occur often due to
i)poor or non uniform lighting conditions,
ii) due to nonlinearity,
iii) small dynamic range of the imaging sensor.
Contrast stretching is done by darkening the levels below m and brightening the levels above
m in the original image. This technique is known as contrast stretching.
s T (r )
b) Histogram processing.
The histogram represents the frequency of occurrence of the various grey levels in the
image. A plot of this function for all values of k provides a global description of the appearance
of the image.By processing (modifying) the histogram of an image we can create a new image
with specific desired properties.
Suppose we have a digital image of size N N with grey levels in the range [0, L 1] . The
histogram of the image is defined as the following discrete function:
p ( rk )
nk
N2
w1
w2
w3
w4
w5
w6
w7
w8
w9
If we replace each pixel by a weighted average of its neighbourhood pixels then the response of
9
the linear mask for the pixel z 5 is wi z i . We may repeat the same process for the whole
i 1
image.
d) Enhancement in the case of a Multiple images
Image averaging
Suppose that we have an image f ( x, y ) of size M N pixels corrupted by noise n ( x , y ) , so
we obtain a noisy image as follows. g ( x, y ) f ( x, y ) n ( x, y )
For the noise process n ( x , y ) the following assumptions are made.
(i)
The noise process n ( x , y ) is ergodic.(ii)It is zero mean, (iii)
autocorrelation function of the noise process is zero.
Suppose now that we have L different noisy realisations of the same image f ( x, y ) as
g i ( x, y ) f ( x , y ) ni ( x , y ) , i 0,1, , L . Each noise process ni ( x, y ) satisfies the properties
(i)-(iii) given above. New image g ( x, y ) is formed by averaging these L noisy images. image
averaging produces an image g ( x, y ) , corrupted by noise with variance less than the variance
of the noise of the original noisy images.
G (u, v ) H (u, v ) F ( u, v )