Source Coding Ompression

This document summarizes key concepts from a chapter on source coding and compression from a digital communications textbook. It discusses fundamental limits on source coding performance based on Shannon's theorems, defines entropy as a measure of information, and describes lossless compression techniques like Huffman coding and Lempel-Ziv encoding. Specific topics covered include entropy, prefix codes, run-length encoding, the Huffman algorithm, and how Lempel-Ziv compression works.

Uploaded by

최규범

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views

Source Coding Ompression

Uploaded by

최규범

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Source Coding-

Compression
Most Topics from Digital Communications-
Simon Haykin
Chapter 9
9.1~9.4
Fundamental Limits on Performance
Given an information source, and a noisy channel
1) Limit on the minimum number of bits
per symbol
2) Limit on the maximum rate for reliable
communication
Shannons theorems
Information Theory
Let the source alphabet,

with the prob. of occurrence

Assume the discrete memory-less source (DMS)

What is the measure of information?

0, 1 -1
{ , .. , }
K
S s s s =
-1
0
0,1, .. , - 1 ( ) , 1
K
k k k
k
k K P s s p and p
=
= = = =

Uncertainty, Information, and Entropy

(cont)
Interrelations between info., uncertainty or surprise
No surprise no information

If A is a surprise and B is another surprise,
then what is the total info. of simultaneous A and B

The amount of info may be related to the inverse of
the prob. of occurrence.

1
( . )
Pr .
Info
ob
~
.( ) .( ) .( ) Info A B Info A Info B +
1
( ) log( )
k
k
I S
p
=
Property of Information

1)
2)
3)
4)

* Custom is to use logarithm of base 2
k k
(s ) 0 for p 1 I = =
k
( ) 0 for 0 p 1
k
I s > s s
k i
( ) ( ) for p p
k i
I s I s > <
indep. statist. s and s if ), ( ) ( ) (
i k i k i k
s I s I s s I + =
Entropy (DMS)
Def. : measure of average information
contents per source symbol
The mean value of over S,

The property of H

1) H(S)=0, iff for some k, and all other
No Uncertainty
2) H(S)=
Maximum Uncertainty

) (
k
s I
K-1 K-1
2
k 0 k 0
1
( ) E[ ( )] ( ) log ( )
k k k k
k
H S I s p I s p
p
= =
= = =

2
0 ( ) log , ( # ) H S K where K is radix of symbols s s =
1 =
k
p 0 ' = s p
i
2
1
log ,
k
K iff p for all k
K
=
Extension of DMS (Entropy)
Consider blocks of symbols rather them individual symbols
Coding efficiency can increase if higher order DMS are used
H(S
n
) means having K
n
disinct symbols where K is the # of
distinct symbols in the alphabet
Thus H(S
n
) = n H(S)

Second order extension means H(S
2
)
Consider a source alphabet S having 3 symbols i.e. {s0, s1, s2}
Thus S
2
will have 9 symbols i.e. {s0s0, s0s1, s0s2, s1s1, ,s2s2}
Average Length
For a code C with associated probabilities p(c) the average
length is defined as

We say that a prefix code C is optimal if for all prefix
codes C, l
a
(C)

s l
a
(C)
l C p c l c
a
c C
( ) ( ) ( ) =
e

Relationship to Entropy
Theorem (lower bound): For any probability
distribution p(S) with associated uniquely decodable
code C,

Theorem (upper bound): For any probability
distribution p(S) with associated optimal prefix code
C,

H S l C
a
( ) ( ) s
l C H S
a
( ) ( ) s +1
Coding Efficiency
Coding Efficiency
n = Lmin/La
where La is the average code-word length
From Shannons Theorem
La >= H(S)
Thus Lmin = H(S)
Thus
n = H(S)/La
Kraft McMillan Inequality
Theorem (Kraft-McMillan): For any uniquely decodable code
C,

Also, for any set of lengths L such that

there is a prefix code C such that

NOTE: Kraft McMillan Inequality does not tell us
whether the code is prefix-free or not
2 1

s
l c
c C
( )
2 1

s
l
l L
l c l i L
i i
( ) ( ,...,| |) = =1
Uniquely Decodable Codes
A variable length code assigns a bit string (codeword)
of variable length to every message value
e.g. a = 1, b = 01, c = 101, d = 011
What if you get the sequence of bits
1011 ?
Is it aba, ca, or, ad?
A uniquely decodable code is a variable length code in
which bit strings can always be uniquely decomposed
into its codewords.
Prefix Codes
A prefix code is a variable length code in which no
codeword is a prefix of another word
e.g a = 0, b = 110, c = 111, d = 10
Can be viewed as a binary tree with message values at the
leaves and 0 or 1s on the edges.
a
b c
d
0
0
0 1
1
1
Some Prefix Codes for Integers

n Binary Unary Split
1 ..001 0 1|
2 ..010 10 10|0
3 ..011 110 10|1
4 ..100 1110 110|00
5 ..101 11110 110|01
6 ..110 111110 110|10
Many other fixed prefix codes:
Golomb, phased-binary, subexponential, ...
Data compression implies sending or storing a
smaller number of bits. Although many methods are
used for this purpose, in general these methods can
be divided into two broad categories: lossless and
lossy methods.
Data compression methods
Run Length Coding
Introduction What is RLE?
Compression technique
Represents data using value and run length
Run length defined as number of consecutive equal values
e.g
1110011111 1 3 0 2 1 5
RLE
Values Run Lengths
Introduction
Compression effectiveness depends on input
Must have consecutive runs of values in order to maximize
compression
Best case: all values same
Can represent any length using two values
Worst case: no repeating values
Compressed data twice the length of original!!
Should only be used in situations where we know for sure have
repeating values
Run-length encoding example
Run-length encoding for two symbols
Encoder Results
Input: 4,5,5,2,7,3,6,9,9,10,10,10,10,10,10,0,0
Output: 4,1,5,2,2,1,7,1,3,1,6,1,9,2,10,6,0,2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1

Best Case:
Input: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Output: 0,16,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1

Worst Case:
Input: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
Output: 0,1,1,1,2,1,3,1,4,1,5,1,6,1,7,1,8,1,9,1,10,1,11,1,12,1,13,1,14,1,15,1
Valid Output
Output Ends Here
Huffman Coding
Huffman Codes
Invented by Huffman as a class assignment in 1950.
Used in many, if not most compression algorithms such
as gzip, bzip, jpeg (as option), fax compression,
Properties:
Generates optimal prefix codes
Cheap to generate codes
Cheap to encode and decode
l
a
=H if probabilities are powers of 2
Huffman Codes
Huffman Algorithm
Start with a forest of trees each consisting of a single
vertex corresponding to a message s and with weight
p(s)
Repeat:
Select two trees with minimum weight roots p
1
and p
2
Join into single tree by adding root with weight p
1
+ p
2

Example
p(a) = .1, p(b) = .2, p(c ) = .2, p(d) = .5
a(.1) b(.2) d(.5) c(.2)
a(.1) b(.2)
(.3)
a(.1) b(.2)
(.3)
c(.2)
a(.1) b(.2)
(.3)
c(.2)
(.5)
(.5) d(.5)
(1.0)
a=000, b=001, c=01, d=1
0
0
0
1
1
1
Step 1
Step 2
Step 3
Encoding and Decoding
Encoding: Start at leaf of Huffman tree and follow path
to the root. Reverse order of bits and send.
Decoding: Start at root of Huffman tree and take branch
for each bit received. When at leaf can output message
and return to root.
a(.1) b(.2)
(.3)
c(.2)
(.5) d(.5)
(1.0)
0
0
0
1
1
1
There are even faster methods that
can process 8 or 32 bits at a time
Huffman codes Pros & Cons
Pros:
The Huffman algorithm generates an optimal prefix code.

Cons:
If the ensemble changes the frequencies and probabilities change
the optimal coding changes
e.g. in text compression symbol frequencies vary with context
Re-computing the Huffman code by running through the entire file in
advance?!
Saving/ transmitting the code too?!
Lempel-Ziv (LZ77)
Lempel-Ziv Algorithms
LZ77 (Sliding Window)
Variants: LZSS (Lempel-Ziv-Storer-Szymanski)
Applications: gzip, Squeeze, LHA, PKZIP, ZOO
LZ78 (Dictionary Based)
Variants: LZW (Lempel-Ziv-Welch),
LZC (Lempel-Ziv-Compress)
Applications:
compress, GIF, CCITT (modems), ARC, PAK

Traditionally LZ77 was better but slower, but the gzip version is
almost as fast as any LZ78.
Lempel Ziv encoding
Lempel Ziv (LZ) encoding is an example of a
category of algorithms called dictionary-based
encoding. The idea is to create a dictionary (a table)
of strings used during the communication session. If
both the sender and the receiver have a copy of the
dictionary, then previously-encountered strings can
be substituted by their index in the dictionary to
reduce the amount of information transmitted.
Compression
In this phase there are two concurrent events:
building an indexed dictionary and compressing a
string of symbols. The algorithm extracts the smallest
substring that cannot be found in the dictionary from
the remaining uncompressed string. It then stores a
copy of this substring in the dictionary as a new entry
and assigns it an index value. Compression occurs
when the substring, except for the last character, is
replaced with the index found in the dictionary. The
process then inserts the index and the last character
of the substring into the compressed string.
An example of Lempel Ziv encoding
Decompression
Decompression is the inverse of the compression
process. The process extracts the substrings from the
compressed string and tries to replace the indexes
with the corresponding entry in the dictionary, which
is empty at first and built up gradually. The idea is
that when an index is received, there is already an
entry in the dictionary corresponding to that index.
An example of Lempel Ziv decoding

Worley Parsons Design Guide Pumps and Pump Circuits
100% (2)
Worley Parsons Design Guide Pumps and Pump Circuits
37 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Advances in Rockfill Structures
No ratings yet
Advances in Rockfill Structures
27 pages
Additive Manufacturing of Ceramics and Cermets: Present Status and Future Perspectives
No ratings yet
Additive Manufacturing of Ceramics and Cermets: Present Status and Future Perspectives
35 pages
CH 6
No ratings yet
CH 6
21 pages
Mobile Communicaton Engineering: Review On Fundamental Limits On Communications
No ratings yet
Mobile Communicaton Engineering: Review On Fundamental Limits On Communications
31 pages
Source Coding
No ratings yet
Source Coding
29 pages
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
No ratings yet
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
7 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Group Presentation Digital Communication Systems
No ratings yet
Group Presentation Digital Communication Systems
29 pages
Digital Communication Unit 5
No ratings yet
Digital Communication Unit 5
105 pages
Information Coding Techniques
No ratings yet
Information Coding Techniques
42 pages
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Image Compression
No ratings yet
Image Compression
38 pages
Lossless Data Compression
No ratings yet
Lossless Data Compression
24 pages
Introduction To Information Technology: Lecture #6
No ratings yet
Introduction To Information Technology: Lecture #6
22 pages
3 Source Coding
No ratings yet
3 Source Coding
31 pages
15-583:algorithms in The Real World: Data Compression I - Introduction - Information Theory - Probability Coding
No ratings yet
15-583:algorithms in The Real World: Data Compression I - Introduction - Information Theory - Probability Coding
33 pages
Noise, Information Theory, and Entropy: CS414 - Spring 2007
No ratings yet
Noise, Information Theory, and Entropy: CS414 - Spring 2007
44 pages
Information Theory Lecture Notes
No ratings yet
Information Theory Lecture Notes
37 pages
Information Theory
No ratings yet
Information Theory
108 pages
Data Compression
No ratings yet
Data Compression
49 pages
Lecture 6
No ratings yet
Lecture 6
22 pages
Information Theory
No ratings yet
Information Theory
26 pages
All Coding
No ratings yet
All Coding
52 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
Information Theory 5th Unit
No ratings yet
Information Theory 5th Unit
20 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
Chapter 2 - Edited
No ratings yet
Chapter 2 - Edited
45 pages
cp467_12_lecture14_compression1
No ratings yet
cp467_12_lecture14_compression1
146 pages
Module IV
No ratings yet
Module IV
37 pages
Unit I Information Theory & Coding Techniques P I
No ratings yet
Unit I Information Theory & Coding Techniques P I
48 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Materi Source Coding
No ratings yet
Materi Source Coding
39 pages
Lecture 7 Source Coding 2024
No ratings yet
Lecture 7 Source Coding 2024
28 pages
9.1 Measure of Information - Entropy: Chapter Outline
No ratings yet
9.1 Measure of Information - Entropy: Chapter Outline
81 pages
CHAPTER 7
No ratings yet
CHAPTER 7
36 pages
Noise, Information Theory, and Entropy
No ratings yet
Noise, Information Theory, and Entropy
34 pages
Information Theory
No ratings yet
Information Theory
38 pages
Digital Communications Lab (CE-343L) : Experiment NO
No ratings yet
Digital Communications Lab (CE-343L) : Experiment NO
3 pages
chap2
No ratings yet
chap2
47 pages
Lecture35-37 SourceCoding
No ratings yet
Lecture35-37 SourceCoding
20 pages
Data Compression
No ratings yet
Data Compression
20 pages
Lossless Compression: Lesson 1
No ratings yet
Lossless Compression: Lesson 1
10 pages
Information Theory and Coding PDF
No ratings yet
Information Theory and Coding PDF
150 pages
Introduction To Data Compression - Guy E. Blelloch PDF
No ratings yet
Introduction To Data Compression - Guy E. Blelloch PDF
54 pages
Dce 1
No ratings yet
Dce 1
21 pages
Chapter 4 Multi
No ratings yet
Chapter 4 Multi
45 pages
Agenda For The Lecture: C Himanshu Tyagi. Feel Free To Use With Acknowledgement
No ratings yet
Agenda For The Lecture: C Himanshu Tyagi. Feel Free To Use With Acknowledgement
7 pages
Lossless Math
No ratings yet
Lossless Math
32 pages
1
No ratings yet
1
86 pages
The Information Theory: C.E. Shannon, A Mathematical Theory of Communication'
No ratings yet
The Information Theory: C.E. Shannon, A Mathematical Theory of Communication'
43 pages
Information Theory and Coding: What You Need To Know in Today's ICE Age!
No ratings yet
Information Theory and Coding: What You Need To Know in Today's ICE Age!
44 pages
Lecture
No ratings yet
Lecture
75 pages
Information Theory and Coding 2marks
No ratings yet
Information Theory and Coding 2marks
12 pages
Unit 2
No ratings yet
Unit 2
30 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Application and Implementation of DES Algorithm Based on FPGA
From Everand
Application and Implementation of DES Algorithm Based on FPGA
madhav
No ratings yet
Python programming for beginners: Python programming for beginners by Tanjimul Islam Tareq
From Everand
Python programming for beginners: Python programming for beginners by Tanjimul Islam Tareq
Tanjimul Islam Tareq
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
TDS 11594210 SyncorePlus
No ratings yet
TDS 11594210 SyncorePlus
17 pages
Gage Repeatability and Reproducibility Data Sheet
No ratings yet
Gage Repeatability and Reproducibility Data Sheet
12 pages
Simplifying Complexity-A Review of Complexity Theory
No ratings yet
Simplifying Complexity-A Review of Complexity Theory
10 pages
CX Motion NCF v.1.9 Manual en 201003
No ratings yet
CX Motion NCF v.1.9 Manual en 201003
148 pages
Demonstration Lesson Plan in Science 6 Inquiry-Based
No ratings yet
Demonstration Lesson Plan in Science 6 Inquiry-Based
12 pages
AHU Static Pressure Calc
No ratings yet
AHU Static Pressure Calc
56 pages
Valve Noise Reduction
No ratings yet
Valve Noise Reduction
7 pages
Six Axis Articlated Robotic Arm 2nd Presentation
No ratings yet
Six Axis Articlated Robotic Arm 2nd Presentation
17 pages
Vacuum Unit Conversion Chart, An ISM Resource
No ratings yet
Vacuum Unit Conversion Chart, An ISM Resource
5 pages
HSS Twist Drill Recommended Speeds and Point Angles
No ratings yet
HSS Twist Drill Recommended Speeds and Point Angles
7 pages
Walter Rudin
No ratings yet
Walter Rudin
4 pages
Multiple View Geometry: Solution Sheet 2: A A B A B A B A B A B A B A B B A B
No ratings yet
Multiple View Geometry: Solution Sheet 2: A A B A B A B A B A B A B A B B A B
2 pages
Interest Rate Futures and Options OpenGamma
No ratings yet
Interest Rate Futures and Options OpenGamma
7 pages
Model 321: Temperature Controller
No ratings yet
Model 321: Temperature Controller
130 pages
Sap PP 4
No ratings yet
Sap PP 4
31 pages
3ds Max (Glass)
No ratings yet
3ds Max (Glass)
12 pages
322 Sample Chapter
100% (1)
322 Sample Chapter
16 pages
116 - Transport Mechanisms in Cells
100% (1)
116 - Transport Mechanisms in Cells
4 pages
ECSE 548 - Electronic Design and Implementation of The Sine Function On 8-Bit MIPS Processor - Report
100% (1)
ECSE 548 - Electronic Design and Implementation of The Sine Function On 8-Bit MIPS Processor - Report
4 pages
Catalog AltoMarine
No ratings yet
Catalog AltoMarine
19 pages
Gaseous Fuel
100% (3)
Gaseous Fuel
20 pages
Process Choreographics
No ratings yet
Process Choreographics
22 pages
MECCOCT18-12511: Volatile Corrosion Inhibitor Gel Casing Filler: A Field Application
No ratings yet
MECCOCT18-12511: Volatile Corrosion Inhibitor Gel Casing Filler: A Field Application
6 pages
Punctuation S
No ratings yet
Punctuation S
28 pages
Mathematical Logic
No ratings yet
Mathematical Logic
6 pages
402-X-2022-23-Sample QP-1
No ratings yet
402-X-2022-23-Sample QP-1
9 pages
FLIR T420, FLIR T420 Thermal Imaging Camera FLIR T 420
No ratings yet
FLIR T420, FLIR T420 Thermal Imaging Camera FLIR T 420
2 pages