Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
0 views

Data Structures Algorithms Part IIIb

Data structures

Uploaded by

Lesbert Bayanay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Data Structures Algorithms Part IIIb

Data structures

Uploaded by

Lesbert Bayanay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Data Structures &

Algorithms Part III


B. CAABAY
Huffman coding for lossless data
compression
❑ Data compression (e.g., zip) transforms data files by exploiting
redundancies in the data. It is used in archiving data to save on storage,
and when moving data to save on transmission time.

• In lossy Compression, we can lose some information. Acceptable for some


applications involving images, audio or videos when some loss of information can be
tolerated.

• Lossless data compression is used when we want to recover completely the


original file after unzipping the zipped file; used for text files, databases (e.g.,
biological sequence data), etc.
Huffman coding for lossless data
compression
❑ Fixed-width vs. variable-width
encoding

• Fixed-width, e.g., Extended ASCII


codes (8 bits per character)

• Variable-width, e.g., Morse code,


where frequently used letters are
assigned shorter codes
Huffman coding for lossless data
compression
❑ Huffman coding has variable-
width, and it is prefix-free (no code
is a prefix of another code, to avoid
ambiguity and to speed up
decompression)

• Note Morse code is not prefix-free, since


the code for ‘E’ is a prefix of the code for
‘A’. Same with ‘D’ and ‘B’.

• Without a good pause in between codes,


“ET” might be confused with “A”, or “ER”
with “F”.
Huffman coding for lossless data
compression
Fixed-width Encoding
CHARACTERS ASCII ENCODING
Message : ALL I SEE IS WHITE A 65 0000
L 76 0001
I 73 0010
▪ Message = 18 chars S 83 0011
Map 4 bits to
▪ Symbols = ALISEWHTspace E 69 0100
each character.
W 87 0101
▪ Cost = 18 chars • 8 bits = 144 bits
H 72 0110
T 84 0111
✓ LEN(ALISEWHTspace) is 910 or 10012 - (space) 32 1000
TOTAL 36 bits
✓ LEN(10012) is 4
Huffman coding for lossless data
compression
Fixed-width Encoding
Message : ALL I SEE IS WHITE
A L L I S E E I S W H I T E
0000 0001 0001 1000 0010 1000 0011 0100 0100 1000 0010 0011 1000 0101 0110 0010 0111 0100

▪ Cost = 18 chars • 4 bits = 72 bits


Huffman coding for lossless data
compression
Fixed-width Encoding
Message : ALL I SEE IS WHITE

Encoded Message:
000000010001100000101000001101000100100000100011100001010110001001110100
ASCII ENCODING
65 0000 How to Decode the Message?
76 0001
73 0010 ✓ Use the table from the fixed-width encoding.
83 0011
69 0100
87 0101
72 0110
84 0111
- 1000
Huffman coding for lossless data
compression
Variable-width Encoding
Message : ALL I SEE IS WHITE CHARACTERS f
A 1
L 2
1. Count the frequency of each character I 3
S 2
E 3
W 1
H 1
T 1
- (space) 4
TOTAL 18
Huffman coding for lossless data
compression
Variable-width Encoding
Message : ALL I SEE IS WHITE

2. Arrange the characters from A:1 W:1 H:1 T:1 L:2 S:2 I:3 E:3 -: 4
least to greatest of their
frequency.
3. Merge the nodes with the least
frequencies until there is only
one node. AW:2 HT:2

A:1 W:1 H:1 T:1 L:2 S:2 I:3 E:3 -: 4


Huffman coding for lossless data
compression
Variable-width Encoding
Message : ALL I SEE IS WHITE

3. Merge the nodes with the


least frequencies until there
is only one node. AWHT:4

AW:2 HT:2 LS:4

A:1 W:1 H:1 T:1 L:2 S:2 I:3 E:3 -: 4


Huffman coding for lossless data
compression
Variable-width Encoding
Message : ALL I SEE IS WHITE

3. Merge the nodes with the


least frequencies until there
is only one node. AWHT:4

AW:2 HT:2 LS:4 IE:6

A:1 W:1 H:1 T:1 L:2 S:2 I:3 E:3 -: 4


Huffman coding for lossless data
compression
Variable-width Encoding
Message : ALL I SEE IS WHITE

AWHTLS:8
3. Merge the nodes with the
least frequencies until there
is only one node. AWHT:4

AW:2 HT:2 LS:4 IE:6

A:1 W:1 H:1 T:1 L:2 S:2 I:3 E:3 -: 4


Huffman coding for lossless data
compression
Variable-width Encoding
Message : ALL I SEE IS WHITE

AWHTLS:8
3. Merge the nodes with the
least frequencies until there
is only one node. AWHT:4
IE-:10

AW:2 HT:2 LS:4 IE:6

A:1 W:1 H:1 T:1 L:2 S:2 I:3 E:3 -: 4


Huffman coding for lossless data
compression
Variable-width Encoding
AWHTLSIE-:18
Message : ALL I SEE IS WHITE

AWHTLS:8
3. Merge the nodes with the
least frequencies until there
is only one node. AWHT:4
IE-:10

AW:2 HT:2 LS:4 IE:6

A:1 W:1 H:1 T:1 L:2 S:2 I:3 E:3 -: 4


Huffman coding for lossless data
compression
Variable-width Encoding
AWHTLSIE-:18
Message : ALL I SEE IS WHITE 0
AWHTLS:8
1
4. Label the edges 0 and 1 0
from left to right.
AWHT:4 1

1 IE-:10
0
0
AW:2 HT:2 LS:4 IE:6 1
0 1 0 1 0 1 0 1

A:1 W:1 H:1 T:1 L:2 S:2 I:3 E:3 -: 4


Huffman coding for lossless data
compression
Variable-width Encoding
Message : ALL I SEE IS WHITE
CHARS HUFFMAN
CODE AWHTLSIE-:18
5. Encode each character by A 0000 0
reading the binary starting L 010 AWHTLS:8
from the root node down to 0
1
I 100
the appropriate character
S 011 1
(leaf node) AWHT:4
E 101 0 1 IE-:10

W 0001 0

H 0010 AW:2 HT:2 LS:4 IE:6 1


0 1 0 1 0 1 0 1
T 0011
- (space) 11 A:1 W:1 H:1 T:1 L:2 S:2 I:3 E:3 -: 4
Huffman coding for lossless data
compression
Variable-width Encoding
Message : ALL I SEE IS WHITE
CHARS HUFFMAN f TOTAL
CODE = f • LEN(Huffman Code) AWHTLSIE-:18
A 0000 1 4 0

L 010 2 6 AWHTLS:8
1
I 100 3 9 0

S 011 2 6 AWHT:4 1

E 101 3 3 1 IE-:10
0
W 0001 1 4 0

H 0010 1 4 AW:2 HT:2 LS:4 IE:6 1


0 1 0 1 0 1 0 1
T 0011 1 4
- (space) 11 4 8 A:1 W:1 H:1 T:1 L:2 S:2 I:3 E:3 -: 4
TOTAL 48 Bits
Huffman coding for lossless data
compression
Variable-width Encoding
Message : ALL I SEE IS WHITE CHARS HUFFMAN LEN(HUFFMAN
CODE CODE)
A 0000 4
L 010 3
I 100 3
S 011 3
E 101 3
W 0001 4
H 0010 4
T 0011 4
- (space) 11 2
TOTAL 30 bits
Huffman coding for lossless data
compression
Cost of TABLE/TREE: No Compression vs. Fixed-width vs. Variable-width
(Huffman Coding)
Message : ALL I SEE IS WHITE

No Compression : No table/Tree
Fixed-width: 9 chars • 4 bits = 36 bits
Variable-width (Huffman Coding): SUM(LEN(Huffman Coding) = 30 bits
Huffman coding for lossless data
compression
Cost of Message: No Compression vs. Fixed-width vs. Variable-width
(Huffman Coding)
Message : ALL I SEE IS WHITE

No Compression : 18 chars • 8 bits = 144 bits


Fixed-width: 9 chars • 4 bits = 72 bits
Variable-width (Huffman Coding): SUM(LEN(Huffman Coding) = 48 bits
Huffman coding for lossless data
compression
Cost of Message and Table/Tree: No Compression vs. Fixed-width vs.
Variable-width (Huffman Coding)
Message : ALL I SEE IS WHITE

Cost of Cost of TOTAL COST


Table/Tree Message
No Compression N/A 144 bits 144 bits
Fixed-width 36 bits 72 bits 108 bits
Variable-width (Huffman Coding) 30 bits 48 bits 78 bits
Sieves of Eratosthenes
A sieve, or a series of sieves, can be a great tool for a
cook or an algorithm designer to screen out undesirables.
The sieve of Eratosthenes is one of the ancient
algorithms. It is used to find all prime numbers less than
some bound n.
Suppose n = 50 Suppose i is 2, remove all multiples of 2.
• i is 3, remove all multiples of 3.
• i is 4, remove all multiples of 4.
• i is 5, remove all multiples of 5.
• i is 6, remove all multiples of 6.
• i is 7, remove all multiples of 5.
• Stop at sqrt(n), output what remains.
Floor(sqrt(50)) = 7
Backtracking: Exhaustive Search
A greedy algorithm builds a solution by
choosing what seems to be the best branch at
each fork on the road. If you had exponential
time to waste, or an exponential number of
copies of yourself (processors), you can try
every branch in a systematic way.
Backtracking: Exhaustive Search
Despite a worst-case exponential time, backtracking may still be useful for reasonably
small instances.
In combinatorial enumeration problems, one wishes to find all possible solutions, e.g., all
possible combinations, all possible permutations, all possible partitions, all possible
spanning trees of a graph with cost < k, etc.
Exhaustive Search: Backtrack search
trees
Given coin denominations of {10p, 5p, 1p}
find all possible ways of forming 16p.
Exhaustive Search: Backtrack search
trees
▪ Generating all n! permutations (arrangements)
▪ { 123, 132, 213, 231, 312, 321 }

3! = 3•2•1 = 6 leaves
Exhaustive Search: Backtrack search
trees
▪ Two different search trees for generating all combinations

C(3,0)

C(3,1)

C(3,2)

C(3,3)
Exhaustive Search: Backtrack search
trees
n non-attacking queens on an nxn
chessboard.
Exhaustive Search: Backtrack search
trees
Mazes, knight’s tour on an mxn chessboard, Sudoku, and a lot more recreational
puzzles, and combinatorial optimization/ enumeration and constraint-satisfaction
problems
Depth-First Search (DFS): Graph Traversal
▪ Depth-First Search (DFS) is a graph traversal algorithm that explores as far as possible
along each branch before backtracking.
▪ It uses a stack (either explicitly or via recursion) to keep track of nodes to visit. DFS is
useful for solving problems like maze exploration, topological sorting, and connected
components detection.
Depth-First Search (DFS): Graph Traversal
▪ Example:

Choose any node. Choose any adjacent vertex No more adjacent vertex from 3. We move
Suppose we choose 2. from 2 (e.g., 3, 4, 0) . back/backtrack to 2. From 2, choices are 4
Suppose We choose 3. and 0. suppose we choose 4.

Visited = { } Visited = {2} Visited = {2, 3}


Depth-First Search (DFS): Graph Traversal
▪ Example:

Adjacent to 4 are 0, 5, 1 Choose any adjacent vertex No more unexplored adjacent vertex from 0.
and 2. We visited 2, so except for visited vertex. We backtrack to 5. From 5, we choose the
ignore 2. Suppose we Suppose We choose 0. unexplored vertex 1.
choose 5.
Visited = {2, 3, 4} Visited = {2, 3, 4, 5} Visited = {2, 3, 4, 5, 0}
Depth-First Search (DFS): Graph Traversal
▪ Example:

All vertices were visited, so


we stop.

Visited = {2, 3, 4, 5, 0, 1}
Breadth-First Search (DFS): Graph
Traversal
▪ a graph traversal algorithm that explores all nodes level by level, starting from a source
node.
▪ It uses a queue to keep track of nodes to visit next.
▪ BFS is commonly used to find the shortest path in an unweighted graph.
Breadth-First Search (BFS): Graph
Traversal
▪ Example:

Choose any node. Choose all unexplored If we choose 3, there are no unexplored adjacent
Suppose we choose 2. adjacent vertex from 2. vertex from 3. Suppose we choose 4. The unexplored
adjacent vertices are 5 and 1.

Visited = [ ] Visited = [2] Visited = [2, 3, 4, 0]


Breadth-First Search (BFS): Graph
Traversal
▪ Example:

All vertices were visited,


so we stop.

Visited = [2, 3, 4, 0, 5, 1]
Graph Traversal: DFS vs. BFS
DFS BFS
Data Structure Stack Queue

Traversal Order Explores as deep as possible Explores Level-by-level

You might also like