Unit 2

Source Coding
Source Coding:
Definition: A conversion of the output of a discrete memory less source
(DMS) into a sequence of binary symbols i.e. binary code word, is called
Source Coding.
Source coding is defined as the process of encoding the information
source message to a binary code word, and lossless and faithful decoding
from the binary code word to the original information source message
The device that performs this conversion is called the Source
Encoder.
Objective of Source Coding: An objective of source coding is to minimize
the average bit rate required for representation of the source by reducing the
redundancy of the information source
 A Source code is identified as a non-singular code if all the code words are
distinct.
 A block code of length 𝑛 if the code words are all of fixed length 𝑛.
 All practical codes must be uniquely decodable.
 A code is said to be uniquely decodable if, and only if, the 𝑛th extension
of the code is non-singular for every finite value of 𝑛.
 A block code of length 𝑛 which is non-singular is uniquely decodable
Codes
 Uniquely decodable code: It is a prefix code (or prefix-free code) if it has the
prefix property, which requires that no codeword is a proper prefix of any
other codeword.
Example: Not Uniquely Decodable Code (0, 10, 010, 101)
Example: Uniquely Decodable Code :(00, 10, 110, 11)
Instantaneous Code: A uniquely decodable code is said to be instantaneous if it
is possible to decode each codeword in a sequence without reference to
succeeding codewords. A necessary and sufficient condition for a code to be
instantaneous is that no codeword is a prefix of some other codeword.
Examples
 Design an instantaneous binary code with lengths of 3, 2, 3, 2, 2.
 Design an instantaneous ternary code with lengths 2, 3, 1, 1, 2.
 Design an instantaneous binary code with lengths 2, 3, 2, 2, 2
 Construction of instantaneous codes for source symbol , length for , and all
other symbol length is five.

Properties of Instantaneous Codes
 Property 1: Easy to prove whether a code is instantaneous by inspection of whether the
code satisfies the prefix condition.
 Property 2: The prefix code permits a systematic design of instantaneous codes based on
the specified code word lengths.
 Property 3: Decoding based on a decoding tree is fast and requires no memory storage.
 Property 4: Instantaneous codes are uniquely decodable codes and where the length of a
code word is the main consideration in the design and selection of codes there is no
advantage in ever considering the general class of uniquely decodable codes which are not
instantaneous.
Few Terms Related to Source Coding Process:
1. Code word Length:
 Let X be a DMS with finite entropy H (X) and an alphabet {𝑥1 … … . . 𝑥 q } with
corresponding probabilities of occurrence P(xi) (i = 1, …. , q). Let the binary code word
assigned to symbol xi by the encoder have length li, measured in bits. The length of the
code word is the number of binary digits in the code word.
2. Average Code word Length:
 The average code word length L, per source symbol is given by
 The parameter represents the average number of bits per source symbol used in the
source coding process.
Contd…
1. Code Efficiency:
 The code efficiency η is defined
as
𝐿𝑚𝑖𝑛
𝜂=
𝐿
2. Code Redundancy:
 The code redundancy γ is defined as
𝜸 = 𝟏 −ƞ
The Source Coding Theorem
 The source coding theorem states that for a DMS X, with entropy H (X), the
𝐿 symbol is bounded as ≥ 𝐻 ( 𝑋 )
Average code word lengthper
And further, can be made as close to H (X) as desired for some suitable chosen code
 Thus,
 The code efficiency can be rewritten as

Classification of Code
1. Fixed – Length Codes
2. Variable – Length Codes
3. Distinct Codes
4. Prefix – Free Codes
5. Uniquely Decodable Codes
6. Instantaneous Codes
7. Optimal Codes
xi Code 1 Code 2 Code 3 Code 4 Code 5 Code 6
x1 00 00 0 0 0 1
x2 01 01 1 10 01 01
00 10 00 110 011 001
x3 11 11 11 111 0111 0001
x
Contd…
1. Fixed – Length Codes:
A fixed – length code is one whose code word length is fixed. Code 1 and Code 2 of
above table are fixed – length code words with length 2.
2. Variable – Length Codes:
A variable – length code is one whose code word length is not fixed. All codes of
above table except Code 1 and Code 2 are variable – length codes.
3. Distinct Codes:
A code is distinct if each code word is distinguishable from each other. All codes of
above table except Code 1 are distinct codes.
Contd…
4. Prefix – Free Codes:
A code in which no code word can be formed by adding code symbols to another code word is
called a prefix- free code. In a prefix – free code, no code word is prefix of another. Codes 2, 4
and 6 of above table are prefix – free codes.
5. Uniquely Decodable Codes:
A distinct code is uniquely decodable if the original source sequence can be reconstructed
perfectly from the encoded binary sequence. A sufficient condition to ensure that a code is
uniquely decodable is that no code word is a prefix of another. Thus the prefix – free codes 2, 4 and
6 are uniquely decodable codes. Prefix – free condition is not a necessary condition for uniquely
decidability. Code 5 albeit does not satisfy the prefix – free condition and yet it is a uniquely
decodable code since the bit 0 indicates the beginning of each code word of the code
Contd…
6. Instantaneous Codes:
A uniquely decodable code is called an instantaneous code if the end of any code word is
recognizable without examining subsequent code symbols. The instantaneous codes have the
property previously mentioned that no code word is a prefix of another code word. Prefix –
free codes are sometimes known as instantaneous codes.
7. Optimal Codes:
A code is said to be optimal if it is instantaneous and has the minimum average L for a
given source with a given probability assignment for the source symbols
Kraft Inequality
 A necessary and sufficient condition for the existence of an instantaneous code with alphabet
size 𝑟 and 𝑞 code words with individual code word lengths of l1, 𝑙2, … … . 𝑙𝑞 is that the
following inequality be satisfied:
 Let X be a DMS with alphabet {𝑥𝑖}(𝑖= 1,2, …, q). Assume that the length of the
assigned binary code word corresponding to xi is li.
 A necessary and sufficient condition for the existence of an instantaneous binary code is
𝑞
𝐾 =∑ 𝑟 −𝑙 ≤ 1 𝑖
𝑖 =1
 This is known as the Kraft Inequality
 It may be noted that Kraft inequality assures us of the existence of an instantaneously
decodable code with code word lengths that satisfy the inequality.
 But it does not show us how to obtain those code words, nor does it say any code satisfies the
inequality is automatically uniquely decodable.
Kraft’s Inequality
McMillan’s Theorem
 Since the class of uniquely decodable codes is larger than the class of instantaneous codes, one
would expect greater efficiencies to be achieved considering the class of all uniquely decodable
codes rather than the more restrictive class of instantaneous codes.
 McMillan’s Theorem assures us that we do not lose out if we only consider the class of
instantaneous codes
 The code word lengths of any uniquely decodable code must satisfy the Kraft Inequality:
Conversely, given a set of code word lengths that satisfy this inequality, then there exists a uniquely
decodable code with these code word lengths
Consider the following two binary codes for the
same source. Which code is better?
Entropy Coding
The design of a variable – length code such that its average code word length
approaches the entropy of DMS is often referred to as Entropy Coding.
 There are basically two types of entropy coding,

1) Shannon – Fano Coding
2) Huffman Coding
Shannon – Fano Coding:
An efficient code can be obtained by the following simple procedure, known as
Shannon–Fano algorithm.
1) List the source symbols in order of decreasing probability.

2) Partition the set into two sets that are as close to equi-probables as possible and assign 0 to
the upper set and 1 to the lower set.
3) Continue this process, each time partitioning the sets with as nearly equal
probabilities as possible until further partitioning is not possible
4) Assign code word by appending the 0s and 1s from left to right
Example (Shannon-Fano Encoding
Algorithm)
 Find the code words occurring in the probability the symbols Find the coding efficiency
and redundancy of the codes.
 Find the code words occurring in the probability the symbols Find the coding efficiency
and redundancy of the codes.

Shannon – Fano Coding - Example
Let there be six (6) source symbols having probabilities as x1 = 0.30, x2 = 0.25, x3 = 0.20, x4
= 0.12, x5 = 0.08 x6 = 0.05. Obtain the Shannon – Fano Coding for the given source
symbols.
𝒙𝒊 P(𝒙𝒊) Step 1 Step 2 Step 3 Step 4 Code
 Shannon Fano Code words
𝒙𝟏 0.30 0 0 00
 H (X) = 2.36 bits/symbol
𝒙𝟐 0.25 0 1 01
 = 2.38
𝒙𝟑 0.20 1 0 10
 η = H (X)/ = 0.99
𝒙𝟒 0.12 1 1 0 110
𝒙𝟓 0.08 1 1 1 0 1110
𝒙𝟔 0.05 1 1 1 1 1111
Huffman Coding:
 Huffman coding results in an optimal code. It is the code that has the highest
efficiency.
 The Huffman coding procedure is as follows:
1) List the source symbols in order of decreasing probability.
2) Combine the probabilities of the two symbols having the lowest probabilities and reorder
the resultant probabilities, this step is called reduction 1. The same procedure is repeated
until there are two ordered probabilities remaining.
Contd…
3) Start encoding with the last reduction, which consists of exactly two ordered
probabilities. Assign 0 as the first digit in the code word for all the source
symbols associated with the first probability; assign 1 to the second probability.
4) Now go back and assign 0 and 1 to the second digit for the two probabilities that
were combined in the previous reduction step, retaining all the source symbols
associated with the first probability; assign 1 to the second probability.
5) Keep regressing this way until the first column is reached.
6) The code word is obtained tracing back from right to left.
Example: Huffman Coding Algorithm
 Alphabet with probability for symbols Find the Huffman codes and also find efficiency and variance
Redundancy:
 Redundancy in information theory refers to the reduction in information content of
a message from its maximum value
For example, consider English having 26 alphabets. Assuming all alphabets are equally
likely to occur, P (xi) = 1/26. For all the 26 letters, the information contained is therefore
log2 26 = 4.7 bits/letter
Assuming that each letter to occur with equal probability is not correct, if we assume that
some letters are more likely to occur than others, it actually reduces the information
content in English from its maximum value of 4.7 bits/symbol.
We define relative entropy on the ratio of H (Y/X) to H (X) which gives the maximum
compression value and Redundancy is then expressed as
Redundancy = H (Y/ X) /H(X)

Run length Coding (Lossless Compression)
 It is a simplest data compression technique.
 Run-length encoding (RLE) is a form of lossless data compression in which runs of data (sequences in
which the same data value occurs in many consecutive data elements) are stored as a single data value and
count, rather than as the original run. This is most useful on data that contains many such runs.
 The general idea behind this method is to replace consecutive repeating occurrences of a symbol by one
occurrence of the symbol followed by the number of occurrences.
 Example 1: Suppose string is AAAAAAA then Run-length encoding is A7 (A is character and 7 is number
of times appear that string)
 Example 2: If input string is “WWWWAAADEXXXXXX”, then the Run-length encoding is
W4A3D1E1X6.
Arithmetic Coding
 Arithmetic coding is distinct from other types of entropy encoding, like Huffman coding, in that it only
encodes the entire message into a single number, an arbitrary-precision fraction q, where 0.0 q 1.0, as
opposed to breaking the input up into individual symbols and replacing each with a code.
 A range that is specified by two integers serves as a representation of the most recent data.
 Because they work directly on a single natural number that represents the most recent information, the
asymmetric numeral systems family of entropy coders, which is relatively new, enables quicker
implementations.

Unit 2

Uploaded by

Copyright:

Available Formats

Unit 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 2

Uploaded by

Copyright:

Available Formats

Source Coding

 Design an instantaneous binary code with lengths of 3, 2, 3, 2, 2.

 Design an instantaneous ternary code with lengths 2, 3, 1, 1, 2.

 Design an instantaneous binary code with lengths 2, 3, 2, 2, 2

other symbol length is five.

 The code efficiency can be rewritten as

2. Variable – Length Codes:

 There are basically two types of entropy coding,

1) List the source symbols in order of decreasing probability.

and redundancy of the codes.

and redundancy of the codes.

Redundancy = H (Y/ X) /H(X)

You might also like