Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit Ii

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 34

UNIT II

SOURCE CODING: TEXT, AUDIO AND SPEECH

SOURCE CODING- AUDIO


Perceptual Coding (PC)
PC are designed for compression of general audio such as that associated with a digital television broadcast. sampled segments of the source audio waveform are analysed but only those features that are perceptible to the ear are transmitted E.g although the human ear is sensitive to signals in the range 15Hz to 20 kHz, the level of sensitivity to each signal is nonlinear; that is the ear is more sensitive to some signals than others.

When multiple signals are present as in audio a strong signal may reduce the level of sensitivity of the ear to other signals which are near to it in frequency. The effect is known as frequency masking
When the ear hears a loud sound it takes a short but a finite time before it could hear a quieter sound .This effect is known as temporal masking

Sensitivity of the ear


The dynamic range of ear is defined as the ratio of the maximum amplitude of the signal to the minimum amplitude and is measured in decibels(dB) It is the ratio of loudest sound it can hear to the quietest sound Sensitivity of the ear varies with the frequency of the signal The ear is most sensitive to signals in the range 2-5kHz hence the signals in this band are the quietest the ear is sensitive to Vertical axis gives all the other signal amplitudes relative to this signal (2-5 kHz) Signal A is above the hearing threshold and B is below the hearing threshold

Audio Compression Perceptual properties of the human ear

Perceptual encoders have been designed for the compression of general audio such as that associated with a digital television broadcast

Audio Compression Perceptual properties of the human ear

When an audio sound consists of multiple frequency signals is present, the sensitivity of the ear changes and varies with the relative amplitude of the signal

Signal B is larger than signal A. This causes the basic sensitivity curve of the ear to be distorted in the region of signal B Signal A will no longer be heard as it is within the distortion band

Variation with frequency of effect of frequency masking

The width of each curve at a particular signal level is known as the critical bandwidth for that frequency

Variation with frequency of effect of frequency masking


The width of each curve at a particular signal level is known as the critical bandwidth It has been observed that for frequencies less than 500Hz, the critical bandwidth is around 100Hz, however, for frequencies greater than 500Hz then bandwidth increases linearly in multiples of 100Hz Hence if the magnitude of the frequency components that make up an audio sound can be determined, it becomes possible to determine those frequencies that will be masked and do not therefore need to be transmitted

Temporal masking
After the ear hears a loud sound it takes a further short time before it can hear a quieter sound This is known as the temporal masking After the loud sound ceases it takes a short period of time for the signal amplitude to decay During this time, signals whose amplitudes are less than the decay envelope will not be heard and hence need not be transmitted In order to achieve this the input audio waveform must be processed over a time period that is comparable with that associated with temporal masking

Audio Compression Temporal masking caused by loud signal

After the ear hears a loud signal, it takes a further short time before it can hear a quieter sound (temporal masking)

Audio Compression MPEG perceptual coder schematic

MPEG audio coder


The audio input signal is first sampled and quantized using PCM The bandwidth available for transmission is divided into a number of frequency subbands using a bank of analysis filters The bank of filters maps each set of 32 (time related) PCM samples into an equivalent set of 32 frequency samples Processing associated with both frequency and temporal masking is carried out by the psychoacoustic model In basic encoder the time duration of each sampled segment of the audio input signal is equal to the time to accumulate 12 successive sets of 32 PCM 12 sets of 32 PCM are converted into frequency components using DFT

MPEG audio coder


The output of the psychoacoutic model is a set of what are known as signal-to-mask ratios (SMRs) and indicate the frequency components whose amplitude is below the audible components This is done to have more bits for highest sensitivity regions compared with less sensitive regions In an encoder all the frequency components are carried in a frame

Audio Compression MPEG perceptual coder schematic

MPEG audio coder frame format


The header contains information such as the sampling frequency that has been used The quantization is performed in two stages using a form of companding The peak amplitude level in each subband is first quantized using 6 bits and a further 4 bits are then used to quantize the 12 frequency components in the subband relative to this level Collectively this is known as the subband sample (SBS) format The ancillary data field at the end of the frame is optional. it is used to carry additional coded samples associated with the surround-sound that is present with some digital video broadcasts

MPEG audio decoder


At the decoder section the dequantizers will determine the magnitude of each signal The synthesis filters will produce the PCM samples at the decoders

You might also like