0% found this document useful (0 votes)

349 views

Implementation Details and Examples: Variable-Length Entropy Encoding Lossless Data Compression

Arithmetic coding is a form of lossless data compression that encodes the entire message into a single number fraction between 0 and 1, rather than separating symbols and assigning codes. It differs from Huffman coding by representing the message as a single number rather than separate codes. During encoding, the interval is divided into sub-intervals proportional to symbol probabilities, and the interval containing the next symbol is selected and further subdivided. Decoding proceeds by reversing this process to determine the symbol sequence. Adaptive arithmetic coding allows changing symbol probabilities during encoding and decoding to better model the data.

Uploaded by

Aayush Sharma

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

349 views

Implementation Details and Examples: Variable-Length Entropy Encoding Lossless Data Compression

Uploaded by

Aayush Sharma

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Arithmetic coding is a form of variable-length entropy encoding used in lossless data compression.

Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic encoding, frequently used characters will be stored with fewer bits and not-so-frequently occurring characters will be stored with more bits, resulting in fewer bits used in total. Arithmetic coding differs from other forms of entropy encoding such as Huffman coding in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, a fraction n where (0.0 n < 1.0).
Contents
[hide]

1 Implementation details and examples

o o o o o

1.1 Equal probabilities 1.2 Defining a model 1.3 Encoding and decoding: overview 1.4 Encoding and decoding: example 1.5 Sources of inefficiency

2 Adaptive arithmetic coding 3 Precision and renormalization 4 Arithmetic coding as a generalized change of radix

4.1 Theoretical limit of compressed message

5 Connections with other compression methods

o o

5.1 Huffman coding 5.2 Range encoding

6 US patents 7 Benchmarks and other technical characteristics 8 Teaching aid 9 See also 10 References 11 External links

[edit]Implementation [edit]Equal

details and examples

probabilities

In the simplest case, the probability of each symbol occurring is equal. For example, consider a sequence taken from a set of three symbols, A, B, and C, each equally likely to occur. Simple block encoding would use 2 bits per symbol, which is wasteful: one of the bit variations is never used. A more efficient solution is to represent the sequence as a rational number between 0 and 1 in base 3, where each digit represents a symbol. For example, the sequence "ABBCAB" could become 0.011201 3. The next step is to encode this ternary number using a fixed-point binary number of sufficient precision to recover it, such as 0.0010110012 this is only 9 bits, 25% smaller than the nave block encoding. This is feasible for long sequences because there are efficient, in-place algorithms for converting the base of arbitrarily precise numbers.

To decode the value, knowing the original string had length 6, one can simply convert back to base 3, round to 6 digits, and recover the string. [edit]Defining

a model

In general, arithmetic coders can produce near-optimal output for any given set of symbols and probabilities (the optimal value is log2P bits for each symbol of probability P, see source coding theorem). Compression algorithms that use arithmetic coding start by determining a model of the data basically a prediction of what patterns will be found in the symbols of the message. The more accurate this prediction is, the closer to optimal the output will be. Example: a simple, static model for describing the output of a particular monitoring instrument over time might be:

60% chance of symbol NEUTRAL 20% chance of symbol POSITIVE 10% chance of symbol NEGATIVE 10% chance of symbol END-OF-DATA. (The presence of this symbol means that the stream will be 'internally terminated', as is fairly common in data compression; when this symbol appears in the data stream, the decoder will know that the entire stream has been decoded.)

Models can also handle alphabets other than the simple four-symbol set chosen for this example. More sophisticated models are also possible: higher-order modelling changes its estimation of the current probability of a symbol based on the symbols that precede it (the context), so that in a model for English text, for example, the percentage chance of "u" would be much higher when it follows a "Q" or a "q". Models can even be adaptive, so that they continuously change their prediction of the data based on what the stream actually contains. The decoder must have the same model as the encoder. [edit]Encoding

and decoding: overview

In general, each step of the encoding process, except for the very last, is the same; the encoder has basically just three pieces of data to consider:

The next symbol that needs to be encoded The current interval (at the very start of the encoding process, the interval is set to [0,1], but that will change) The probabilities the model assigns to each of the various symbols that are possible at this stage (as mentioned earlier, higher-order or adaptive models mean that these probabilities are not necessarily the same in each step.)

The encoder divides the current interval into sub-intervals, each representing a fraction of the current interval proportional to the probability of that symbol in the current context. Whichever interval corresponds to the actual symbol that is next to be encoded becomes the interval used in the next step. Example: for the four-symbol model above:

the interval for NEUTRAL would be [0, 0.6) the interval for POSITIVE would be [0.6, 0.8) the interval for NEGATIVE would be [0.8, 0.9) the interval for END-OF-DATA would be [0.9, 1).

When all symbols have been encoded, the resulting interval unambiguously identifies the sequence of symbols that produced it. Anyone who has the same final interval and model that is being used can reconstruct the symbol sequence that must have entered the encoder to result in that final interval. It is not necessary to transmit the final interval, however; it is only necessary to transmit one fraction that lies within that interval. In particular, it is only necessary to transmit enough digits (in whatever base) of the fraction so that all fractions that begin with those digits fall into the final interval. [edit]Encoding

and decoding: example

A diagram showing decoding of 0.538 (the circular point) in the example model. The region is divided into subregions proportional to symbol frequencies, then the subregion containing the point is successively subdivided in the same way.

Consider the process for decoding a message encoded with the given four-symbol model. The message is encoded in the fraction 0.538 (using decimal for clarity, instead of binary; also assuming that there are only as many digits as needed to decode the message.) The process starts with the same interval used by the encoder: [0,1), and using the same model, dividing it into the same four sub-intervals that the encoder must have. The fraction 0.538 falls into the sub-interval for NEUTRAL, [0, 0.6); this indicates that the first symbol the encoder read must have been NEUTRAL, so this is the first symbol of the message. Next divide the interval [0, 0.6) into sub-intervals:

the interval for NEUTRAL would be [0, 0.36) -- 60% of [0, 0.6) the interval for POSITIVE would be [0.36, 0.48) -- 20% of [0, 0.6) the interval for NEGATIVE would be [0.48, 0.54) -- 10% of [0, 0.6) the interval for END-OF-DATA would be [0.54, 0.6). -- 10% of [0, 0.6)

Since .538 is within the interval [0.48, 0.54), the second symbol of the message must have been NEGATIVE. Again divide our current interval into sub-intervals:

the interval for NEUTRAL would be [0.48, 0.516) the interval for POSITIVE would be [0.516, 0.528) the interval for NEGATIVE would be [0.528, 0.534) the interval for END-OF-DATA would be [0.534, 0.540).

Now .538 falls within the interval of the END-OF-DATA symbol; therefore, this must be the next symbol. Since it is also the internal termination symbol, it means the decoding is complete. If the stream is not internally terminated, there needs to be some other way to indicate where the stream stops. Otherwise, the decoding process could continue forever, mistakenly reading more symbols from the fraction than were in fact encoded into it. [edit]Sources

of inefficiency

The message 0.538 in the previous example could have been encoded by the equally short fractions 0.534, 0.535, 0.536, 0.537 or 0.539. This suggests that the use of decimal instead of binary introduced some inefficiency. This is correct; the information content of a three-digit decimal is approximately 9.966 bits; the same message could have been encoded in the binary fraction 0.10001010 (equivalent to 0.5390625 decimal) at a cost of only 8 bits. (The final zero must be specified in the binary fraction, or else the message would be ambiguous without external information such as compressed stream size.)

This 8 bit output is larger than the information content, or entropy of the message, which is 1.57 3 or 4.71 bits. The large difference between the example's 8 (or 7 with external compressed data size information) bits of output and the entropy of 4.71 bits is caused by the short example message not being able to exercise the coder effectively. The claimed symbol probabilities were [0.6, 0.2, 0.1, 0.1], but the actual frequencies in this example are [0.33, 0, 0.33, 0.33]. If the intervals are readjusted for these frequencies, the entropy of the message would be 1.58 bits and the same NEUTRAL NEGATIVE ENDOFDATA message could be encoded as intervals [0, 1/3); [1/9, 2/9); [5/27, 6/27); and a binary interval of [1011110, 1110001). This could yield an output message of 111, or just 3 bits. This is also an example of how statistical coding methods like arithmetic encoding can produce an output message that is larger than the input message, especially if the probability model is off. [edit]Adaptive

arithmetic coding

One advantage of arithmetic coding over other similar methods of data compression is the convenience of adaptation. Adaptation is the changing of the frequency (or probability) tables while processing the data. The decoded data matches the original data as long as the frequency table in decoding is replaced in the same way and in the same step as in encoding. The synchronization is, usually, based on a combination of symbols occurring during the encoding and decoding process. Adaptive arithmetic coding significantly improves the compression ratio compared to static methods; it may be as effective as 2 to 3 times better in the result. [edit]Precision

and renormalization

The above explanations of arithmetic coding contain some simplification. In particular, they are written as if the encoder first calculated the fractions representing the endpoints of the interval in full, using infinite precision, and only converted the fraction to its final form at the end of encoding. Rather than try to simulate infinite precision, most arithmetic coders instead operate at a fixed limit of precision which they know the decoder will be able to match, and round the calculated fractions to their nearest equivalents at that precision. An example shows how this would work if the model called for the interval [0,1) to be divided into thirds, and this was approximated with 8 bit precision. Note that since now the precision is known, so are the binary ranges we'll be able to use.

Symbol

Probability (expressed as fraction)

Interval reduced to eight-bit Interval reduced to eight-bit precision (as fractions) precision (in binary)

Range in binary

1/3

[0, 85/256)

[0.00000000, 0.01010101)

00000000 01010100

1/3

[85/256, 171/256)

[0.01010101, 0.10101011)

01010101 10101010

1/3

[171/256, 1)

[0.10101011, 1.00000000)

10101011 11111111

A process called renormalization keeps the finite precision from becoming a limit on the total number of symbols that can be encoded. Whenever the range is reduced to the point where all values in the range share certain beginning digits, those digits are sent to the output. For however many digits of precision the computer can handle, it is now handling fewer than that, so the existing digits are shifted left, and at the right, new digits are added to expand the range as widely as possible. Note that this result occurs in two of the three cases from our previous example.

Symbol Probability

Range

Digits that can be sent to output Range after renormalization

1/3

00000000 01010100

00000000 10101001

1/3

01010101 10101010

None

01010101 10101010

1/3

10101011 11111111

01010110 11111111

[edit]Arithmetic

coding as a generalized change of radix

Recall that in the case where the symbols had equal probabilities, arithmetic coding could be implemented by a simple change of base, or radix. In general, arithmetic (and range) coding may be interpreted as a generalized change of radix. For example, we may look at any sequence of symbols:

DABDDB
as a number in a certain base presuming that the involved symbols form an ordered set and each symbol in the ordered set denotes a sequential integer A = 0, B = 1, C = 2, D = 3, and so on. This results in the following frequencies and cumulative frequencies:

Symbol Frequency of occurrence Cumulative frequency

The cumulative frequency is the total of all frequencies below it in a frequency distribution (a running total of frequencies). In a positional numeral system the radix, or base, is numerically equal to a number of different symbols used to express the number. For example, in the decimal system the number of symbols is 10, namely 0,1,2,3,4,5,6,7,8,9. The radix is used to express any finite integer in a presumed multiplier in polynomial 2 1 0 form. For example, the number 457 is actually 410 + 510 + 710 , where base 10 is presumed but not shown explicitly. Initially, we will convert DABDDB into a base-6 numeral, because 6 is the length of the string. The string is first mapped into the digit string 301331, which then maps to an integer by the polynomial:

The result 23671 has a length of 15 bits, which is not very close to the theoretical limit (the entropy of the message), which is approximately 9 bits. To encode a message with a length closer to the theoretical limit imposed by information theory we need to slightly generalize the classic formula for changing the radix. We will compute lower and upper

bounds L and U and choose a number between them. For the computation of L we multiply each term in the above expression by the product of the frequencies of all previously occurred symbols:

The difference between this polynomial and the polynomial above is that each term is multiplied by the product of the frequencies of all previously occurring symbols. More generally, L may be computed as:

where Ci are the cumulative frequencies and fk are the frequencies of occurrences. Indexes denote the position of the symbol in a message. In the special case where all frequencies fk are 1, this is the change-of-base formula. The upper bound U will be L plus the product of all frequencies; in this case U = L + (3 1 2 3 3 2) = 25002 + 108 = 25110. In general, U is given by:

Now we can choose any number from the interval [L, U) to represent the message; one convenient choice is the value with the longest possible trail of zeroes, 25100, since it 2 allows us to achieve compression by representing the result as 25110 . The zeroes can also be truncated, giving 251, if the length of the message is stored separately. Longer messages will tend to have longer trails of zeroes. To decode the integer 25100, the polynomial computation can be reversed as shown in the table below. At each stage the current symbol is identified, then the corresponding term is subtracted from the result.

Remainder Identification Identified symbol

Corrected remainder

25100

25100 / 6 = 3 D

(25100 6 3) / 3 = 590

590

590 / 6 = 0

(590 6 0) / 1 = 590

590

590 / 6 = 2

(590 6 1) / 2 = 187

187

187 / 6 = 5

(187 6 3) / 3 = 26

26 / 6 = 4

(26 6 3) / 3 = 2

2/6 =2

During decoding we take the floor after dividing by the corresponding power of 6. The result is then matched against the cumulative intervals and the appropriate symbol is selected from look up table. When the symbol is identified the result is corrected. The process is continued for the known length of the message or while the remaining result is positive. The only difference compared to the classical change-of-base is that there may be a range of values associated with each symbol. In this example, A is always 0, B is either 1 or 2, and D is any of 3, 4, 5. This is in exact accordance with our intervals that are determined by the frequencies. When all intervals are equal to 1 we have a special case of the classic base change. [edit]Theoretical

limit of compressed message

The lower bound L never exceeds n , where n is the size of the message, and so can be n represented in log 2(n ) = nlog 2(n) bits. After the computation of the upper bound U and the reduction of the message by selecting a number from the interval [L, U) with the longest trail of zeros we can presume that this length can be reduced

by bits. Since each frequency in a product occurs exactly same number of times as the value of this frequency, we can use the size of the alphabet A for the computation of the product

Applying log2 for the estimated number of bits in the message, the final message (not counting a logarithmic overhead for the message length and frequency tables) will match the number of bits given by entropy, which for long messages is very close to optimal:

[edit]Connections [edit]Huffman

with other compression methods

coding

Main article: Huffman coding There is great similarity between arithmetic coding and Huffman coding in fact, it has been shown that Huffman is just a specialized case of arithmetic coding but because arithmetic coding translates the entire message into one number represented in base b, rather than translating each symbol of the message into a series of digits in base b, it will sometimes approach optimal entropy encoding much more closely than Huffman can. In fact, a Huffman code corresponds closely to an arithmetic code where each of the frequencies is rounded to a nearby power of for this reason Huffman deals relatively poorly with distributions where symbols have frequencies far from a power of , such as 0.75 or 0.375. This includes most distributions where there are either a small numbers of symbols (such as just the bits 0 and 1) or where one or two symbols dominate the rest. For an alphabet {a, b, c} with equal probabilities of 1/3, Huffman coding may produce the following code:

a 0: 50%

b 10: 25% c 11: 25%

This code has an expected (2 + 2 + 1)/3 1.667 bits per symbol for Huffman coding, an inefficiency of 5 percent compared to log23 1.585 bits per symbol for arithmetic coding. For an alphabet {0, 1} with probabilities 0.625 and 0.375, Huffman encoding treats them as though they had 0.5 probability each, assigning 1 bit to each value, which does not achieve any compression over naive block encoding. Arithmetic coding approaches the optimal compression ratio of

When the symbol 0 has a high probability of 0.95, the difference is much greater:

One simple way to address this weakness is to concatenate symbols to form a new alphabet in which each symbol represents a sequence of symbols in the original alphabet. In the above example, grouping sequences of three symbols before encoding would produce new "super-symbols" with the following frequencies:

000: 85.7% 001, 010, 100: 4.5% each 011, 101, 110: .24% each 111: 0.0125%

With this grouping, Huffman coding averages 1.3 bits for every three symbols, or 0.433 bits per symbol, compared with one bit per symbol in the original encoding. [edit]Range

encoding

Main article: Range encoding Range encoding is regarded by one group of engineers as a different technique and by another group only as a different name for [who?][verification needed] arithmetic coding. There is no unique opinion but [weasel words] some people believe that, when processing is applied as one step per symbol, it is range coding, and when one step is required per every bit it is arithmetic coding. In another [who?] opinion arithmetic coding is the computing of two boundaries on interval [0,1) and choosing the shortest fraction from it, and range n encoding is computing boundaries on the interval [0,n ) and choosing the number with the longest trail of zeros from within. Many [who?] researchers believe that slight difference in the approach makes range encoding patent free. To back up this idea they provide reference to the article of G. Nigel N. Martin, which is not reader friendly and is subject to interpretation. It is cited in the Glen Langdon article An Introduction to Arithmetic Coding, IBM J. RES. DEVELOP. VOL. 28, No 2, March 1984, which makes the method suggested by Martin as prior art recognized by an industry expert. It is close to the first topic of the current article with the difference that both the LOW and HIGH limits are computed on every step and that probabilities are still used for narrowing down the interval and not the frequencies. The

article of G. N. N. Martin amazingly dropped out of attention of many researchers who were filing patents on arithmetic coding explaining the matter of their algorithms as building long proper fraction, which put all their patents at risk to be circumvented by those who do it differently because a patent is a very formal document and language [verification needed] definitions should be very precise. It is not necessary that all patents on arithmetic coding are now void in the light of Martin's article but it opens the ground for debates, which could have [verification been avoided if authors at least mentioned the approach.
needed]

n computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes". Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (sometimes called "prefix-free codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. A method was later found to design a Huffman code in linear time if input probabilities (also known [citation needed] asweights) are sorted. For a set of symbols with a uniform probability distribution and a number of members which is a power of two, Huffman coding is equivalent to simple binaryblock encoding, e.g., ASCII coding. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm. Although Huffman's original algorithm is optimal for a symbol-by-symbol coding (i.e. a stream of unrelated symbols) with a known input probability distribution, it is not optimal when the symbol-by-symbol restriction is dropped, or when the probability mass functions are unknown, not identically distributed, or notindependent (e.g., "cat" is more common than "cta"). Other methods such as arithmetic coding and LZW coding often have better compression capability: both of these methods can combine an arbitrary number of symbols for more efficient coding, and generally adapt to the actual input statistics, the latter of which is useful when input probabilities are not precisely known or vary significantly within the stream. However, the limitations of Huffman coding should not be overstated; it can be used adaptively, accommodating unknown, changing, or context-dependent probabilities. In the case of known independent and identically-distributed random variables, combining symbols together reduces inefficiency in a way that approaches optimality as the number of symbols combined increases.
Contents
[hide]

1 History 2 Problem definition

o o o

2.1 Informal description 2.2 Formalized description 2.3 Samples

3 Basic technique

o o

3.1 Compression 3.2 Decompression

4 Main properties 5 Variations

o o o o o o o

5.1 n-ary Huffman coding 5.2 Adaptive Huffman coding 5.3 Huffman template algorithm 5.4 Length-limited Huffman coding 5.5 Huffman coding with unequal letter costs 5.6 Optimal alphabetic binary trees (Hu-Tucker coding) 5.7 The canonical Huffman code

6 Applications 7 See also 8 Notes 9 References 10 External links

[edit]History In 1951, David A. Huffman and his MIT information theory classmates were given the choice of a term paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding the most efficient binary code. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the [1] most efficient. In doing so, the student outdid his professor, who had worked with information theory inventor Claude Shannon to develop a similar code. Huffman avoided the major flaw of the suboptimal Shannon-Fano coding by building the tree from the bottom up instead of from the top down. [edit]Problem [edit]Informal Given A set of symbols and their weights (usually proportional to probabilities). Find A prefix-free binary code (a set of codewords) with minimum expected codeword length (equivalently, a tree with minimum weighted path length from the root). [edit]Formalized Input. Alphabet Set proportional to probabilities), i.e. Output. , which is the symbol alphabet of size n. , which is the set of the (positive) symbol weights (usually .

definition
description

description

Code codeword for Goal. .

, which is the set of (binary) codewords, where ci is the

Let Condition: [edit]Samples for any code

be the weighted path length of code C. .

Symbol (ai) Input (A, W) Weights (wi)

Sum

0.10

0.15

0.30

0.16

0.29

Codewords (ci)

010 011

Output C

Codeword length (in bits) (li)

Weighted path length (li wi )

0.30

0.45

0.60

0.32

0.58

L(C) = 2.25

Probability budget -l (2 i)

1/8

1/4

= 1.00

Optimality

Information content (in bits) 3.32 (log2 wi)

2.74

1.74

2.64

1.79

Entropy (wi log2 wi)

0.332 0.411 0.521 0.423 0.518 H(A) = 2.205

For any code that is biunique, meaning that the code is uniquely decodeable, the sum of the probability budgets across all symbols is always less than or equal to one. In this example, the sum is strictly equal to one; as a result, the code is termed a complete code. If this is not the case, you can always derive an equivalent code by adding extra symbols (with associated null probabilities), to make the code complete while keeping it biunique. As defined by Shannon (1948), the information content h (in bits) of each symbol ai with non-null probability is

The entropy H (in bits) is the weighted sum, across all symbols ai with non-zero probability wi, of the information content of each symbol:

(Note: A symbol with zero probability has zero contribution to the entropy, since out of the formula above.) So for simplicity, symbols with zero probability can be left

As a consequence of Shannon's source coding theorem, the entropy is a measure of the smallest codeword length that is theoretically possible for the given alphabet with associated weights. In this example, the weighted average codeword length is 2.25 bits per symbol, only slightly larger than the calculated entropy of 2.205 bits per symbol. So not only is this code optimal in the sense that no other feasible code performs better, but it is very close to the theoretical limit established by Shannon. Note that, in general, a Huffman code need not be unique, but it is always one of the codes minimizing L(C). [edit]Basic

technique

[edit]Compression

A source generates 4 different symbols{a1,a2,a3,a4} with probability{0.4;0.35;0.2;0.05}. A binary tree is generated from left to right taking the two least probable symbols and putting them together to form another equivalent symbol having a probability that equals the sum of the two symbols. The process is repeated until there is just one symbol. The tree can then be read backwards, from right to left, assigning different bits to different branches. The final Huffman code is:

Symbol Code

110

111

The standard way to represent a signal made of 4 symbols is by using 2 bits/symbol, but the entropy of the source is 1.74 bits/symbol. If this Huffman code is used to represent the signal, then the average length is lowered to 1.85 bits/symbol; it is still far from the theoretical limit because the probabilities of the symbols are different from negative powers of two.

The technique works by creating a binary tree of nodes. These can be stored in a regular array, the size of which depends on the number of symbols, n. A node can be either a leaf node or an internal node. Initially, all nodes are leaf nodes, which contain the symbol itself, the weight (frequency of appearance) of the symbol and optionally, a link to a parent node which makes it easy to read the code (in reverse) starting from a leaf node. Internal nodes contain symbolweight, links to two child nodes and the optional link to a parent node. As a common convention, bit '0' represents following the left child and bit '1' represents following the right child. A finished tree has up to n leaf nodes and n 1 internal nodes. A Huffman tree that omits unused symbols produces the most optimal code lengths. The process essentially begins with the leaf nodes containing the probabilities of the symbol they represent, then a new node whose children are the 2 nodes with smallest probability is created, such that the new node's probability is equal to the sum of the children's probability. With the previous 2 nodes merged into one node (thus not considering them anymore), and with the new node being now considered, the procedure is repeated until only one node remains, the Huffman tree. The simplest construction algorithm uses a priority queue where the node with lowest probability is given highest priority: 1. 2. Create a leaf node for each symbol and add it to the priority queue. While there is more than one node in the queue: 1. Remove the two nodes of highest priority (lowest probability) from the queue 2. Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities. 3. 3. Add the new node to the queue.

The remaining node is the root node and the tree is complete.

Since efficient priority queue data structures require O(log n) time per insertion, and a tree with n leaves has 2n1 nodes, this algorithm operates in O(n log n) time, where n is the number of symbols. If the symbols are sorted by probability, there is a linear-time (O(n)) method to create a Huffman tree using two queues, the first one containing the initial weights (along with pointers to the associated leaves), and combined weights (along with pointers to the trees) being put in the back of the second queue. This assures that the lowest weight is always kept at the front of one of the two queues: 1. 2. 3. Start with as many leaves as there are symbols. Enqueue all leaf nodes into the first queue (by probability in increasing order so that the least likely item is in the head of the queue). While there is more than one node in the queues: 1. Dequeue the two nodes with the lowest weight by examining the fronts of both queues.

Create a new internal node, with the two just-removed nodes as children (either node can be either child) and the sum of their weights as the new weight.

3. 4.

Enqueue the new node into the rear of the second queue.

The remaining node is the root node; the tree has now been generated.

Although this algorithm may appear "faster" complexity-wise than the previous algorithm using a priority queue, this is not actually the case because the symbols need to be sorted by probability before-hand, a process that takes O(n log n) time in itself. In many cases, time complexity is not very important in the choice of algorithm here, since n here is the number of symbols in the alphabet, which is typically a very small number (compared to the length of the message to be encoded); whereas complexity analysis concerns the behavior when n grows to be very large. It is generally beneficial to minimize the variance of codeword length. For example, a communication buffer receiving Huffman-encoded data may need to be larger to deal with especially long symbols if the tree is especially unbalanced. To minimize variance, simply break ties between queues by choosing the item in the first queue. This modification will retain the mathematical optimality of the Huffman coding while both minimizing variance and minimizing the length of the longest character code. Here's an example using the French subject string "j'aime aller sur le bord de l'eau les jeudis ou les jours impairs":

[edit]Decompression Generally speaking, the process of decompression is simply a matter of translating the stream of prefix codes to individual byte values, usually by traversing the Huffman tree node by node as each bit is read from the input stream (reaching a leaf node necessarily terminates the search for that particular byte value). Before this can take place, however, the Huffman tree must be somehow reconstructed. In the simplest case, where character frequencies are fairly predictable, the tree can be preconstructed (and even statistically adjusted on each compression cycle) and thus reused every time, at the expense of at least some measure of compression efficiency. Otherwise, the information to reconstruct the tree must be sent a priori. A naive approach might be to prepend the frequency count of each character to the compression stream. Unfortunately, the overhead in such a case could amount to several kilobytes, so this method has little practical use. If the data is compressed using canonical B encoding, the compression model can be precisely reconstructed with just B2 bits of information (where B is the number of bits per symbol). Another method is to simply prepend the Huffman tree, bit by bit, to the output stream. For example, assuming that the value of 0 represents a parent node and 1 a leaf node, whenever the latter is encountered the tree building routine simply reads the next 8 bits to determine the character value of that particular leaf. The process continues recursively until the last leaf node is reached; at that point, the

Huffman tree will thus be faithfully reconstructed. The overhead using such a method ranges from roughly 2 to 320 bytes (assuming an 8-bit alphabet). Many other techniques are possible as well. In any case, since the compressed data can include unused "trailing bits" the decompressor must be able to determine when to stop producing output. This can be accomplished by either transmitting the length of the decompressed data along with the compression model or by defining a special code symbol to signify the end of input (the latter method can adversely affect code length optimality, however). [edit]Main

properties

The probabilities used can be generic ones for the application domain that are based on average experience, or they can be the actual frequencies found in the text being compressed. (This variation requires that a frequency table or other hint as to the encoding must be stored with the compressed text; implementations employ various tricks to store tables efficiently.) Huffman coding is optimal when the probability of each input symbol is a negative power of two. Prefix codes tend to have inefficiency on small alphabets, where probabilities often fall between these optimal points. "Blocking", or expanding the alphabet size by grouping multiple symbols into "words" of fixed or variable-length before Huffman coding helps both to reduce that inefficiency and to take advantage of statistical dependencies between input symbols within the group (as in the case of natural language text). The worst case for Huffman coding 1 can happen when the probability of a symbol exceeds 2 = 0.5, making the upper limit of inefficiency unbounded. These situations often respond well to a form of blocking called runlength encoding; for the simple case of Bernoulli processes,Golomb coding is a provably optimal run-length code. Arithmetic coding produces some gains over Huffman coding, although arithmetic coding has higher computational complexity. Also, arithmetic coding was historically a subject of some concern overpatent issues. However, as of mid-2010, various well-known effective techniques for arithmetic coding have passed into the public domain as the early patents have expired. [edit]Variations Many variations of Huffman coding exist, some of which use a Huffman-like algorithm, and others of which find optimal prefix codes (while, for example, putting different restrictions on the output). Note that, in the latter case, the method need not be Huffman-like, and, indeed, need not even be polynomial time. An exhaustive list of papers on Huffman coding and its variations is given by "Code and Parse Trees for Lossless Source Encoding"[1]. [edit]n-ary

Huffman coding

The n-ary Huffman algorithm uses the {0, 1, ... , n 1} alphabet to encode message and build an n-ary tree. This approach was considered by Huffman in his original paper. The same algorithm applies as for binary (n equals 2) codes, except that the n least probable symbols are taken together, instead of just the 2 least probable. Note that for n greater than 2, not all sets of source words can properly form an n-ary tree for Huffman coding. In this case, additional 0-probability place holders must be added. This is because the tree must form an n to 1 contractor; for binary coding, this is a 2 to 1 contractor, and any sized set can form such a contractor. If the number of source words is congruent to 1 modulo n-1, then the set of source words will form a proper Huffman tree. [edit]Adaptive

Huffman coding

A variation called adaptive Huffman coding involves calculating the probabilities dynamically based on recent actual frequencies in the sequence of source symbols, and changing the coding tree structure to match the updated probability estimates. [edit]Huffman

template algorithm

Most often, the weights used in implementations of Huffman coding represent numeric probabilities, but the algorithm given above does not require this; it requires only that the weights form a totally ordered commutative monoid, meaning a way to order weights and to add them. The Huffman template algorithm enables one to use any kind of weights (costs, frequencies, pairs of weights, non-numerical weights) and one of many combining methods (not just addition). Such algorithms can solve other minimization problems, such as minimizing [edit]Length-limited , a problem first applied to circuit design [2].

Huffman coding

Length-limited Huffman coding is a variant where the goal is still to achieve a minimum weighted path length, but there is an additional restriction that the length of each codeword must be less than a given constant. The package-merge algorithm solves this problem with a simple greedy approach very similar to that used by Huffman's algorithm. Its time complexity is O(nL), where L is the maximum length of a codeword. No algorithm is known to solve this problem in linear or linearithmic time, unlike the presorted and unsorted conventional Huffman problems, respectively. [edit]Huffman

coding with unequal letter costs

In the standard Huffman coding problem, it is assumed that each symbol in the set that the code words are constructed from has an equal cost to transmit: a code word whose length is N digits will always have a cost of N, no matter how many of those digits are 0s, how many are 1s, etc. When working under this assumption, minimizing the total cost of the message and minimizing the total number of digits are the same thing. Huffman coding with unequal letter costs is the generalization in which this assumption is no longer assumed true: the letters of the encoding alphabet may have non-uniform lengths, due to characteristics of the transmission medium. An example is the encoding alphabet of Morse code, where a 'dash' takes longer to send than a 'dot', and therefore the cost of a dash in transmission time is higher. The goal is still to minimize the weighted average codeword length, but it is no longer sufficient just to minimize the number of symbols used by the message. No algorithm is known to solve this in the same manner or with the same efficiency as conventional Huffman coding. [edit]Optimal

alphabetic binary trees (Hu-Tucker coding)

In the standard Huffman coding problem, it is assumed that any codeword can correspond to any input symbol. In the alphabetic version, the alphabetic order of inputs and outputs must be identical. Thus, for example, code could not be assigned , but instead should be assigned

either or . This is also known as the Hu-Tucker problem, after the authors of the paper presenting the first linearithmic solution to this optimal binary alphabetic problem, which has some similarities to Huffman algorithm, but is not a variation of this algorithm. These optimal alphabetic binary trees are often used as binary search trees. [edit]The

canonical Huffman code

If weights corresponding to the alphabetically ordered inputs are in numerical order, the Huffman code has the same lengths as the optimal alphabetic code, which can be found from calculating these lengths, rendering Hu-Tucker coding unnecessary. The code resulting from numerically (re-)ordered input is sometimes called the canonical Huffman code and is often the code used in practice, due to ease of encoding/decoding. The technique for finding this code is sometimes called Huffman-Shannon-Fano coding, since it is optimal like Huffman coding, but alphabetic in weight probability, likeShannon-Fano coding. The Huffman-

Shannon-Fano code corresponding to the example is {000,001,01,10,11}, which, having the same codeword lengths as the original solution, is also optimal. [edit]Applications Arithmetic coding can be viewed as a generalization of Huffman coding, in the sense that they k produce the same output when every symbol has a probability of the form 1/2 ; in particular it tends to offer significantly better compression for small alphabet sizes. Huffman coding nevertheless remains in wide use because of its simplicity and high speed. Intuitively, arithmetic coding can offer better compression than Huffman coding because its "code words" can have effectively non-integer bit lengths, whereas code words in Huffman coding can only have an integer number of bits. Therefore, there is an inefficiency in Huffman coding where a k code word of length k only optimally matches a symbol of probability 1/2 and other probabilities are not represented as optimally; whereas the code word length in arithmetic coding can be made to exactly match the true probability of the symbol. Huffman coding today is often used as a "back-end" to some other compression methods. DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3 have a front-end model andquantization followed by Huffman coding (or variable-length prefix-free codes with a similar structure, although perhaps not necessarily [clarification needed] designed by using Huffman's algorithm ). LempelZivWelch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement, and has the [1] potential for very high throughput in hardware implementations.
Contents
[hide]

1 Algorithm

o o o o

1.1 Encoding 1.2 Decoding 1.3 Variable-width codes 1.4 Packing order

2 Example

o o

2.1 Encoding 2.2 Decoding

3 Further coding 4 Uses 5 Patents 6 Variants 7 See also 8 References 9 External links

[edit]Algorithm

The scenario described in Welch's 1984 paper encodes sequences of 8-bit data as fixed-length 12-bit codes. The codes from 0 to 255 represent 1-character sequences consisting of the corresponding 8-bit character, and the codes 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded. At each stage in compression, input bytes are gathered into a sequence until the next character would make a sequence for which there is no code yet in the dictionary. The code for the sequence (without that character) is emitted, and a new code (for the sequence with that character) is added to the dictionary. The idea was quickly adapted to other situations. In an image based on a color table, for example, the natural character alphabet is the set of color table indexes, and in the 1980s, many images had small color tables (on the order of 16 colors). For such a reduced alphabet, the full 12-bit codes yielded poor compression unless the image was large, so the idea of a variable-width code was introduced: codes typically start one bit wider than the symbols being encoded, and as each code size is used up, the code width increases by 1 bit, up to some prescribed maximum (typically 12 bits). Further refinements include reserving a code to indicate that the code table should be cleared (a "clear code", typically the first value immediately after the values for the individual alphabet characters), and a code to indicate the end of data (a "stop code", typically one greater than the clear code). The clear code allows the table to be reinitialized after it fills up, which lets the encoding adapt to changing patterns in the input data. Smart encoders can monitor the compression efficiency and clear the table whenever the existing table no longer matches the input well. Since the codes are added in a manner determined by the data, the decoder mimics building the table as it sees the resulting codes. It is critical that the encoder and decoder agree on which variety of LZW is being used: the size of the alphabet, the maximum code width, whether variable-width encoding is being used, the initial code size, whether to use the clear and stop codes (and what values they have). Most formats that employ LZW build this information into the format specification or provide explicit fields for them in a compression header for the data. [edit]Encoding A high level view of the encoding algorithm is shown here: 1. Initialize the dictionary to contain all strings of length one. 2. Find the longest string W in the dictionary that matches the current input. 3. Emit the dictionary index for W to output and remove W from the input. 4. Add W followed by the next symbol in the input to the dictionary. 5. Go to Step 2. A dictionary is initialized to contain the single-character strings corresponding to all the possible input characters (and nothing else except the clear and stop codes if they're being used). The algorithm works by scanning through the input string for successively longer substrings until it finds one that is not in the dictionary. When such a string is found, the index for the string less the last character (i.e., the longest substring that is in the dictionary) is retrieved from the dictionary and sent to output, and the new string (including the last character) is added to the dictionary with the next available code. The last input character is then used as the next starting point to scan for substrings. In this way, successively longer strings are registered in the dictionary and made available for subsequent encoding as single output values. The algorithm works best on data with repeated patterns, so the initial parts of a message will see little compression. As the message grows, however, the compression ratio tends [2] asymptotically to the maximum. [edit]Decoding The decoding algorithm works by reading a value from the encoded input and outputting the corresponding string from the initialized dictionary. At the same time it obtains the next value from the input, and adds to the dictionary the concatenation of the string just output and the first character of the string obtained by decoding the next input value. The decoder then proceeds to the next input value (which was already read in as the "next value" in the

[1]

previous pass) and repeats the process until there is no more input, at which point the final input value is decoded without any more additions to the dictionary. In this way the decoder builds up a dictionary which is identical to that used by the encoder, and uses it to decode subsequent input values. Thus the full dictionary does not need be sent with the encoded data; just the initial dictionary containing the single-character strings is sufficient (and is typically defined beforehand within the encoder and decoder rather than being explicitly sent with the encoded data.) [edit]Variable-width

codes

If variable-width codes are being used, the encoder and decoder must be careful to change the width at the same points in the encoded data, or they will disagree about where the boundaries between individual codes fall in the stream. In the standard version, the encoder increases the width from p to p + 1 when a sequence + s is encountered that is not in the table (so that a code must be added for it) but the next available code in the table is p 2 (the first code requiring p + 1 bits). The encoder emits the code for at width p (since that code does not require p + 1 bits), and then increases the code width so that the next code emitted will be p + 1 bits wide. The decoder is always one code behind the encoder in building the table, so when it sees the code for , it will p generate an entry for code 2 1. Since this is the point where the encoder will increase the code width, the decoder must increase the width here as well: at the point where it generates the largest code that will fit in p bits. Unfortunately some early implementations of the encoding algorithm increase the code width and then emit at the new width instead of the old width, so that to the decoder it looks like the width changes one code too early. This is called "Early Change"; it caused so much confusion that Adobe now allows both versions in PDF files, but includes an explicit flag in the header of each LZW-compressed stream to indicate whether Early Change is being used. Most graphic file formats do not use Early Change. When the table is cleared in response to a clear code, both encoder and decoder change the code width after the clear code back to the initial code width, starting with the code immediately following the clear code. [edit]Packing

order

Since the codes emitted typically do not fall on byte boundaries, the encoder and decoder must agree on how codes are packed into bytes. The two common methods are LSB-First ("Least Significant Bit First") and MSBFirst ("Most Significant Bit First"). In LSB-First packing, the first code is aligned so that the least significant bit of the code falls in the least significant bit of the first stream byte, and if the code has more than 8 bits, the high order bits left over are aligned with the least significant bits of the next byte; further codes are packed with LSB going into the least significant bit not yet used in the current stream byte, proceeding into further bytes as necessary. MSB-first packing aligns the first code so that its most significant bit falls in the MSB of the first stream byte, with overflow aligned with the MSB of the next byte; further codes are written with MSB going into the most significant bit not yet used in the current stream byte. GIF files use LSB-First packing order. TIFF files and PDF files use MSB-First packing order. [edit]Example The following example illustrates the LZW algorithm in action, showing the status of the output and the dictionary at every stage, both in encoding and decoding the data. This example has been constructed to give reasonable compression on a very short message. In real text data, repetition is generally less pronounced, so longer input streams are typically necessary before the compression builds up efficiency. The plaintext to be encoded (from an alphabet using only the capital letters) is:

TOBEORNOTTOBEORTOBEORNOT#

The # is a marker used to show that the end of the message has been reached. There are thus 26 symbols in the plaintext alphabet (the 26 capital letters A through Z), plus the stop code #. We arbitrarily assign these the values 1 through 26 for the letters, and 0 for '#'. (Most flavors of LZW would put the stop code after the data alphabet, but nothing in the basic algorithm requires that. The encoder and decoder only have to agree what value it has.)

A computer will render these as strings of bits. Five-bit codes are needed to give sufficient combinations to encompass this set of 27 values. The dictionary is initialized with these 27 values. As the dictionary grows, the 5 codes will need to grow in width to accommodate the additional entries. A 5-bit code gives 2 = 32 possible combinations of bits, so when the 33rd dictionary word is created, the algorithm will have to switch at that point from 5-bit strings to 6-bit strings (for all code values, including those which were previously output with only five bits). Note that since the all-zero code 00000 is used, and is labeled "0", the 33rd dictionary entry will be labeled 32. (Previously generated output is not affected by the code-width change, but once a 6-bit value is generated in the dictionary, it could conceivably be the next code emitted, so the width for subsequent output shifts to 6 bits to accommodate that.) The initial dictionary, then, will consist of the following entries:

Symbol Binary Decimal

# 00000

A 00001

B 00010

C 00011

D 00100

E 00101

F 00110

G 00111

H 01000

I 01001

J 01010

K 01011

L 01100

M 01101

N 01110

O 01111

P 10000

Q 10001

R 10010

S 10011

T 10100

U 10101

V 10110

W 10111

X 11000

Y 11001

Z 11010 [edit]Encoding

Buffer input characters in a sequence until + next character is not in the dictionary. Emit the code for , and add + next character to the dictionary. Start buffering again with the next character.

Output Current Sequence Next Char Code Bits Extended Dictionary Comments

NULL

10100

27:

TO 27 = first available code after 0 through 26

01111

28:

00010

29:

00101

30:

01111

31:

10010

32:

32 requires 6 bits, so for next output use 6 bits

14 001110

33:

15 001111

34:

20 010100

35:

27 011011

36:

TOB

29 011101

37:

BEO

31 011111

38:

ORT

TOB

36 100100

39:

TOBE

30 011110

40:

EOR

32 100000

41:

RNO

34 100010

# stops the algorithm; send the cur seq

0 000000

and the stop code

Unencoded length = 25 symbols 5 bits/symbol = 125 bits Encoded length = (6 codes 5 bits/code) + (11 codes 6 bits/code) = 96 bits. Using LZW has saved 29 bits out of 125, reducing the message by almost 22%. If the message were longer, then the dictionary words would begin to represent longer and longer sections of text, allowing repeated words to be sent very compactly. [edit]Decoding To decode an LZW-compressed archive, one needs to know in advance the initial dictionary used, but additional entries can be reconstructed as they are always simply concatenations of previous entries.

Input Output Sequence Bits Code

New Dictionary Entry Comments Full Conjecture

10100

27:

01111

27:

TO 28:

00010

28:

OB 29:

00101

29:

BE 30:

01111

30:

EO 31:

10010

31:

OR 32:

R? created code 31 (last to fit in 5 bits)

001110

32:

RN 33:

N? so start using 6 bits

001111

33:

NO 34:

010100

34:

OT 35:

011011

35:

TT 36:

TO?

011101

36:

TOB 37:

BE? 36 = TO + 1st symbol (B) of

011111

37:

BEO 38:

OR? next coded sequence received (BE)

100100

TOB

38:

ORT 39: TOB?

011110

39: TOBE 40:

EO?

100000

40:

EOR 41:

RN?

100010

41:

RNO 42:

OT?

000000

At each stage, the decoder receives a code X; it looks X up in the table and outputs the sequence it codes, and it conjectures + ? as the entry the encoder just added because the encoder emitted X for precisely because + ? was not in the table, and the encoder goes ahead and adds it. But what is the missing letter? It is the first letter in the sequence coded by the next code Z that the decoder receives. So the decoder looks up Z, decodes it into the sequence and takes the first letter z and tacks it onto the end of as the next dictionary entry. This works as long as the codes received are in the decoder's dictionary, so that they can be decoded into sequences. What happens if the decoder receives a code Z that is not yet in its dictionary? Since the decoder is always just one code behind the encoder, Z can be in the encoder's dictionary only if the encoder just generated it, when emitting the previous code X for . Thus Z codes some that is + ?, and the decoder can determine the unknown character as follows: 1. The decoder sees X and then Z. 2. It knows X codes the sequence and Z codes some unknown sequence . 3. It knows the encoder just added Z to code + some unknown character, 4. and it knows that the unknown character is the first letter z of . 5. But the first letter of (= + ?) must then also be the first letter of . 6. So must be + x, where x is the first letter of . 7. So the decoder figures out what Z codes even though it's not in the table, 8. and upon receiving Z, the decoder decodes it as + x, and adds + x to the table as the value of Z.

This situation occurs whenever the encoder encounters input of the form cScSc, where c is a single character, S is a string and cS is already in the dictionary, but cSc is not. The encoder emits the code for cS, putting a new code for cSc into the dictionary. Next it sees cSc in the input (starting at the second c of cScSc) and emits the new code it just inserted. The argument above shows that whenever the decoder receives a code not in its dictionary, the situation must look like this. Although input of form cScSc might seem unlikely, this pattern is fairly common when the input stream is characterized by significant repetition. In particular, long strings of a single character (which are common in the kinds of images LZW is often used to encode) repeatedly generate patterns of this sort. [edit]Further

coding

The simple scheme described above focuses on the LZW algorithm itself. Many applications apply further encoding to the sequence of output symbols. Some package the coded stream as printable characters using some form of Binary-to-text encoding; this will increase the encoded length and decrease the compression frequency. Conversely, increased compression can often be achieved with an adaptive entropy encoder. Such a coder estimates the probability distribution for the value of the next symbol, based on the observed frequencies of values so far. A standard entropy encoding such as Huffman coding or arithmetic coding then uses shorter codes for values with higher probabilities. [edit]Uses LZW compression became the first widely used universal data compression method on computers. A large English text file can typically be compressed via LZW to about half its original size. LZW was used in the public-domain program compress, which became a more or less standard utility in Unix systems circa 1986. It has since disappeared from many distributions, both because it infringed the LZW patent and because gzip produced better compression ratios using the LZ77-based DEFLATE algorithm, but as of 2008 at least FreeBSD includes both compress and uncompress as a part of the distribution. Several other popular compression utilities also used LZW, or closely related methods. LZW became very widely used when it became part of the GIF image format in 1987. It may also (optionally) be used in TIFF and PDF files. (Although LZW is available in Adobe Acrobat software, Acrobat by default uses DEFLATE for most text and color-table-based image data in PDF files.)

Coding Theory
100% (1)
Coding Theory
297 pages
Compression and Coding Algorithms PDF
No ratings yet
Compression and Coding Algorithms PDF
284 pages
Delhi Noida Toll Bridge
No ratings yet
Delhi Noida Toll Bridge
22 pages
Lab Week 1 - Eryk
100% (1)
Lab Week 1 - Eryk
3 pages
Huffman Coding 1
No ratings yet
Huffman Coding 1
54 pages
Information Theory and Coding Sample Question 2021
No ratings yet
Information Theory and Coding Sample Question 2021
5 pages
Coding
No ratings yet
Coding
9 pages
Arithmetic Coding
No ratings yet
Arithmetic Coding
22 pages
Arithmetic Coding
No ratings yet
Arithmetic Coding
15 pages
GSM Network and Services: Channel Coding - From Source Data To Radio Bursts
No ratings yet
GSM Network and Services: Channel Coding - From Source Data To Radio Bursts
21 pages
CH02 - Starting Out With Python
No ratings yet
CH02 - Starting Out With Python
87 pages
Arithmetic Coding
No ratings yet
Arithmetic Coding
5 pages
Huffman Coding
No ratings yet
Huffman Coding
10 pages
Image Compression Coding Schemes
50% (4)
Image Compression Coding Schemes
96 pages
Fundamentals of Programming
No ratings yet
Fundamentals of Programming
18 pages
Error Control Coding
No ratings yet
Error Control Coding
76 pages
Introduction To Python: by Dr. Tamer Ahmed Farrag
No ratings yet
Introduction To Python: by Dr. Tamer Ahmed Farrag
18 pages
Data Compressio MCQ
No ratings yet
Data Compressio MCQ
19 pages
Information Theory Coding and Cryptograp PDF
No ratings yet
Information Theory Coding and Cryptograp PDF
140 pages
Multimedia University of Kenya. Faculty of Engineering and Technology. Bsc. Electrical and Telecommunication Engineering
No ratings yet
Multimedia University of Kenya. Faculty of Engineering and Technology. Bsc. Electrical and Telecommunication Engineering
8 pages
Python Chatbot Project: January 2022
No ratings yet
Python Chatbot Project: January 2022
6 pages
Coding in Malaysia
No ratings yet
Coding in Malaysia
1 page
3rd - Python - L 1
No ratings yet
3rd - Python - L 1
9 pages
OS Total
100% (1)
OS Total
50 pages
12 - Huffman Coding Algorithm
No ratings yet
12 - Huffman Coding Algorithm
16 pages
Coding Theory: A Bird's Eye View
No ratings yet
Coding Theory: A Bird's Eye View
51 pages
Coding Practices Coding Standards
No ratings yet
Coding Practices Coding Standards
26 pages
Python
No ratings yet
Python
4 pages
Hands-on Scikit-Learn for machine learning applications: data science fundamentals with Python David Paper all chapter instant download
100% (3)
Hands-on Scikit-Learn for machine learning applications: data science fundamentals with Python David Paper all chapter instant download
65 pages
Unit I Information Theory & Coding Techniques P I
No ratings yet
Unit I Information Theory & Coding Techniques P I
48 pages
Coding Round Question & Answers
No ratings yet
Coding Round Question & Answers
56 pages
Fundamentals of Algorithm
No ratings yet
Fundamentals of Algorithm
14 pages
Coding Standards
No ratings yet
Coding Standards
7 pages
Numpy PDF
No ratings yet
Numpy PDF
273 pages
Algorithm Analysis Design Lecture1 PowerPoint Presentation
No ratings yet
Algorithm Analysis Design Lecture1 PowerPoint Presentation
9 pages
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
No ratings yet
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
6 pages
Uninformed Search Algorithms
No ratings yet
Uninformed Search Algorithms
58 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
2 pages
Awesome Big Data Algorithms
No ratings yet
Awesome Big Data Algorithms
37 pages
Ai Unit 1 - Compressed
No ratings yet
Ai Unit 1 - Compressed
142 pages
Unit 3
100% (1)
Unit 3
53 pages
CH - 2 Query Process
No ratings yet
CH - 2 Query Process
44 pages
Lossy Compression Algorithms
100% (2)
Lossy Compression Algorithms
18 pages
02-CH02-CompSec2e-ver02 Cryptographic Tools PDF
No ratings yet
02-CH02-CompSec2e-ver02 Cryptographic Tools PDF
36 pages
Advances in Computers - Web Technology Vol. 67 PDF
No ratings yet
Advances in Computers - Web Technology Vol. 67 PDF
347 pages
Chapter 1 - Introdution To AI
100% (1)
Chapter 1 - Introdution To AI
45 pages
Information Theory and Coding Notes - Akshansh
100% (2)
Information Theory and Coding Notes - Akshansh
158 pages
Artificial Intelligence Notes
No ratings yet
Artificial Intelligence Notes
20 pages
Master Python E Book 1
No ratings yet
Master Python E Book 1
257 pages
Puter Games With Python 2nd Edition May 2010 ISBN 0982106017 PDF
No ratings yet
Puter Games With Python 2nd Edition May 2010 ISBN 0982106017 PDF
433 pages
Chapter 01 - Introduction Distributed Syetem
No ratings yet
Chapter 01 - Introduction Distributed Syetem
45 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
First Year
No ratings yet
First Year
21 pages
Sample Questions Answers
No ratings yet
Sample Questions Answers
8 pages
Python Practice Problems List
No ratings yet
Python Practice Problems List
4 pages
Computational Intelligence
No ratings yet
Computational Intelligence
723 pages
Ethical Hacking Unit-1
No ratings yet
Ethical Hacking Unit-1
30 pages
Decimal To Binary (By Me)
100% (1)
Decimal To Binary (By Me)
4 pages
Basic Operating Tutorial of Robot Operating System
No ratings yet
Basic Operating Tutorial of Robot Operating System
40 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Arithmetic Coding
No ratings yet
Arithmetic Coding
15 pages
Arithmetic Coding: Implementation Details and Examples
No ratings yet
Arithmetic Coding: Implementation Details and Examples
11 pages
Corporate Social Responsibilities (CSR) : 4/3/2013 Xidas 1
No ratings yet
Corporate Social Responsibilities (CSR) : 4/3/2013 Xidas 1
5 pages
Attitude-Mapping Questionnaire (For MBA / MCA Students)
No ratings yet
Attitude-Mapping Questionnaire (For MBA / MCA Students)
3 pages
Clearing and Settlement: Financial Derivatives
No ratings yet
Clearing and Settlement: Financial Derivatives
24 pages
Supply Chain Management of Honda Toyota
No ratings yet
Supply Chain Management of Honda Toyota
52 pages
Amritsar Intercity Bus Terminal Project
No ratings yet
Amritsar Intercity Bus Terminal Project
10 pages
Determinants of Foreign Exchange
No ratings yet
Determinants of Foreign Exchange
24 pages
Land Rover and Jaguar
No ratings yet
Land Rover and Jaguar
7 pages
BC Report Wrinting
No ratings yet
BC Report Wrinting
3 pages
Porter's 5 Force Model
No ratings yet
Porter's 5 Force Model
11 pages
Book Review of First 90 Days
No ratings yet
Book Review of First 90 Days
2 pages
Thermo-020H Decanter DownLoadLy - Ir
No ratings yet
Thermo-020H Decanter DownLoadLy - Ir
6 pages
Assignment 1559118487 Sms
No ratings yet
Assignment 1559118487 Sms
8 pages
Introduction To Refrigeration and Air Conditioning Systems Theory and Applications (Allan T. Kirkpatrick) (Z-Library)
No ratings yet
Introduction To Refrigeration and Air Conditioning Systems Theory and Applications (Allan T. Kirkpatrick) (Z-Library)
165 pages
EXP 7 Body
No ratings yet
EXP 7 Body
23 pages
Propeller, Types of Propellers and Construction of Propellers
No ratings yet
Propeller, Types of Propellers and Construction of Propellers
4 pages
FMEA Template
No ratings yet
FMEA Template
28 pages
Best Biology Books For Neet2018
No ratings yet
Best Biology Books For Neet2018
2 pages
Final Report MT PGN SAKA SIDAYU PROJECT
No ratings yet
Final Report MT PGN SAKA SIDAYU PROJECT
14 pages
Technical Catalogue
No ratings yet
Technical Catalogue
313 pages
Useful Reference Mass, Volume and Flow
No ratings yet
Useful Reference Mass, Volume and Flow
3 pages
Asme b1-2
No ratings yet
Asme b1-2
190 pages
Brew Like A Champ
No ratings yet
Brew Like A Champ
26 pages
Journal of Applied Mechanics: Finite Elements For Structural Analysis. by William Weaver
No ratings yet
Journal of Applied Mechanics: Finite Elements For Structural Analysis. by William Weaver
1 page
A Comparative Design of RCC and Prestressed Concrete Flyover Along With RCC Abutments
No ratings yet
A Comparative Design of RCC and Prestressed Concrete Flyover Along With RCC Abutments
3 pages
The Structure of Scientific Revolutions
100% (1)
The Structure of Scientific Revolutions
11 pages
Resume 10-16-2017
No ratings yet
Resume 10-16-2017
2 pages
Full Download SIGNALS AND SYSTEMS 1st Edition A Nagoor Kani PDF DOCX
100% (8)
Full Download SIGNALS AND SYSTEMS 1st Edition A Nagoor Kani PDF DOCX
60 pages
Cont Prob Dist-2
No ratings yet
Cont Prob Dist-2
29 pages
21.-Graphs-of-L.E_Given-2-points (1)
No ratings yet
21.-Graphs-of-L.E_Given-2-points (1)
24 pages
Rock Bolts in Dams
No ratings yet
Rock Bolts in Dams
105 pages
Instruction Manual: B-380 Series
No ratings yet
Instruction Manual: B-380 Series
64 pages
Lm170e01 G5
No ratings yet
Lm170e01 G5
28 pages
Bulletin 40-60 (2004-10)
No ratings yet
Bulletin 40-60 (2004-10)
40 pages
Csir Q
No ratings yet
Csir Q
86 pages
New Methods For Teaching Introductory Physics To Non-Majors
No ratings yet
New Methods For Teaching Introductory Physics To Non-Majors
11 pages
Fluent-Fsi 14.0 ws3 Hyperelastic Flap Part1
100% (1)
Fluent-Fsi 14.0 ws3 Hyperelastic Flap Part1
22 pages
CE Review Hydraulics Lecture
No ratings yet
CE Review Hydraulics Lecture
26 pages
Eurocode 2 Provision Against American Standards (Aci 209R-92 and Gardner&Lockman Models) in Creep Analysis of Composite Steel-Concrete Section
No ratings yet
Eurocode 2 Provision Against American Standards (Aci 209R-92 and Gardner&Lockman Models) in Creep Analysis of Composite Steel-Concrete Section
19 pages