Lesson 4 - Coding of Text, Voice, Image, and Video
Lesson 4 - Coding of Text, Voice, Image, and Video
4.0
The information that has to be exchanged between two entities (persons or machines) in a
communication system can be in one of the following formats:
Text
Voice
Image
Video
In an electrical communication system, the information is first converted into an electrical
signal. For instance,
A microphone is the transducer that converts the human voice into an analog signal.
Similarly, the video camera converts the real-life scenery into an analog signal.
In a digital communication system, the first step is to convert the analog signal into digital
format using analog-to-digital conversion techniques. This digital signal representation for
various types of information is the topic of this lesson.
4.1
Text Messages
Text messages are generally represented in ASCII (American Standard Code for Information
Interchange), in which a 7-bit code is used to represent each character. Another code form
called EBCDIC (Extended Binary Coded Decimal Interchange Code) is also used. ASCII is
the most widely used coding scheme for representation of text in computers.
Using ASCII, the number of characters that can be represented is limited to 128 because only
7-bit code is used. Out of these 128 characters, 33 are non-printing control characters (many
now obsolete) that affect how text and space are processed. The other 95 are printable
characters, including the space (which is considered an invisible graphic). The ASCII code is
used for representing many European languages as well.
To transmit text messages, first the text is converted into any one of the character-encoding
schemes (such as ASCII), and then the bit stream is converted into an electrical signal.
Note: In extended ASCII, each character is represented by 8 bits. Using 8 bits, a number of
graphic characters and control characters can be represented.
To represent all the world languages, Unicode has been developed. Unicode uses 16 bits to
represent each character and can be used to encode the characters of any recognized language
in the world. Modern programming languages such as Java and markup languages such as
XML support Unicode.
It is important to note that the ASCII/Unicode coding mechanism is not the best way,
according to Shannon. If we consider the frequency of occurrence of the letters of a language
and use small codewords for frequently occurring letters, the coding will be more efficient.
However, more processing will be required, and more delay will result.
Page 1 of 11
4.2
Voice
To transmit voice from one place to another, the speech (acoustic signal) is first converted
into an electrical signal using a transducer, the microphone. This electrical signal is an analog
signal. The voice signal corresponding to the speech "how are you" is shown in Figure 4.1.
4.2.1
Waveform Coding
Waveform coding is done in such a way that the analog electrical signal can be reproduced at
the receiving end with minimum distortion. Hundreds of waveform coding techniques have
been proposed by many researchers. We will study two important waveform coding
techniques: pulse code modulation (PCM) and adaptive differential pulse code modulation
(ADPCM).
Page 3 of 11
Page 4 of 11
4.2.2
Vocoding
A radically different method of coding speech signals was proposed by H. Dudley in 1939.
He named his coder vocoder, a term derived from VOice CODER. In a vocoder, the electrical
model for speech production seen in Figure 4.4 is used.
Page 6 of 11
Linear Prediction
The basic concept of linear prediction is that the sample of a voice signal can be
approximated as a linear combination of the past samples of the signal.
If Sn is the nth speech sample, then
S n = a k S n k + GU n
where
ak (k = 1,,P) are the linear prediction coefficients
G is the gain of the vocal tract filter
Un is the excitation to the filter.
Linear prediction coefficients (generally 8 to 12) represent the vocal tract filter coefficients.
Calculating the linear prediction coefficients involves solving P linear equations. One of the
most widely used methods for solving these equations is through the Durbin and Levinson
algorithm.
Coding of the voice signal using linear prediction analysis involves the following steps:
At the transmitting end, divide the voice signal into frames, each frame of 20msec
duration. For each frame, calculate the linear prediction coefficients and pitch and find
out whether the frame is voiced or unvoiced. Convert these values into code words and
send them to the receiving end.
At the receiver, using these parameters and the speech production model, reconstruct the
voice signal.
Page 7 of 11
4.3
Image
To transmit an image, the image is divided into grids called pixels (or picture elements). The
higher the number of grids, the higher the resolution. Grid sizes such as 1024 768 and 800
600 are generally used in computer graphics.
For black-and-white pictures, each pixel is given a certain gray-scale value. If there are 256
gray-scale levels, each pixel is represented by 8 bits. So, to represent a picture with a grid size
of 400 600 pixels with each pixel of 8 bits, 240kbytes of storage is required.
To represent color, the levels of the three fundamental colorsred, blue, and greenare
combined together. The shades of the colors will be higher if more levels of each color are
used.
In image coding, the image is divided into small grids called pixels, and each pixel is
quantized. The higher the number of pixels, the higher will be the quality of the reconstructed
image.
For example, if an image is coded with a resolution of 352 240 pixels, and each pixel is
represented by 24 bits, the size of the image is 352 240 24/8 = 247.5 kilobytes.
To store the images as well as to send them through a communication medium, the image
needs to be compressed. A compressed image occupies less storage space if stored on a
medium such as hard disk or CD-ROM. If the image is sent through a communication
medium, the compressed image can be transmitted fast.
One of the most widely used image coding formats is JPEG format. Joint Photograph Experts
Group (JPEG) proposed this standard for coding of images. The block diagram of JPEG
image compression is shown in Figure 4.5.
4.4
Video
A video signal occupies a bandwidth of 5MHz. Using the Nyquist sampling theorem, we
need to sample the video signal at 10 samples/msec. If we use 8-bit PCM, video signal
requires a bandwidth of 80Mbps. This is a very high data rate, and this coding technique is
not suitable for digital transmission of video. A number of video coding techniques have been
proposed to reduce the data rate.
For video coding, the video is considered a series of frames. At least 16 frames per second
are required to get the perception of moving video. Each frame is compressed using the
image compression techniques and transmitted. Using this technique, video can be
compressed to 64kbps, though the quality will not be very good.
Video encoding is an extension of image encoding. As shown in Figure 4.6, a series of
images or frames, typically 16 to 30 frames, is transmitted per second. Due to the persistence
of the eye, these discrete images appear as though it is a moving video.
Accordingly, the data rate for transmission of video will be the number of frames multiplied
by the data rate for one frame. The data rate is reduced to about 64kbps in desktop video
conferencing systems where the resolution of the image and the number of frames are
reduced considerably. The resulting video is generally acceptable for conducting business
meetings over the Internet or corporate intranets, but not for transmission of, say, dance
programs, because the video will have many jerks.
Page 9 of 11
Page 10 of 11
Page 11 of 11