An Introduction To Arithmetic Coding: Glen G. Langdon, JR
An Introduction To Arithmetic Coding: Glen G. Langdon, JR
Arithmetic coding is a data compression technique that encodes data (the data string) by creating a code string which represents a
fractional value on the number line between 0 and 1. The coding algorithm is symbolwise recursive; i.e., it operates upon and
encodes (decodes) one data symbol per iteration or recursion. On each recursion, the algorithm successively partitions an interval
of the number line between 0 and I , and retains one of the partitions as the new interval. Thus, the algorithm successively deals
with smaller intervals, and the code string, viewed as a magnitude, lies in each of the nested intervals. The data string is recovered
by using magnitude comparisons on the code string to recreate how the encoder must have successively partitioned and retained
each nested subinterval. Arithmetic coding differs considerably from the more familiar compression coding techniques, such as
prefix (Huffman) codes. Also, it should not be confused with error control coding, whose object is to detect and correct errors in
computer operations. This paper presents the keynotions of arithmetic compression coding by meansof simple examples.
1. Introduction
Arithmetic coding maps a string of data (source) symbols to a frequency (probability) of the events. The encoder accepts the
code string in such a way that the original data can be event and some indication of its relative frequency and gen-
recovered from the code string. The encoding and decoding erates the code string.
algorithms perform arithmetic operations on the code string.
One recursion of the algorithm handles onedata symbol. A simple model is the memoryless model, where the data
Arithmetic coding is actually a family of codes which share symbols themselves are encoded according to a single code.
the property of treating the code string as a magnitude. For a Another model is the first-order Markov model, which uses
brief history ofthe development of arithmetic coding, refer to the previous symbol as the context for the current symbol.
Appendix 1. Consider, for example, compressing English sentences. If the
data symbol (in this case, a letter) “q” is the previous letter,
Compression systems we wouldexpect the next letter to be “u.” The first-order
The notion of compression systems captures the idea that data Markov modelis a dependent model; we have a different
may be transformed into something which is encoded, then expectation for each symbol (or in the example, each letter),
transmitted to a destination, then transformed back into the depending on the context. The context is, in a sense, a state
original data. Any data compression approach, whether em- governed by the past sequence of symbols. The purpose of a
ploying arithmetic coding, Huffman codes, or any other cod- context is to provide a probability distribution, or statistics,
ing technique, has a model which makes some assumptions for encoding (decoding) the next symbol.
about the data and the events encoded.
Corresponding to the symbols are statistics. To simplify the
The code itself can be independent of the model. Some discussion, consider a single-context model, i.e., the memory-
systemswhich compress waveforms ( e g , digitizedspeech) less model. Data compression results from encoding the more-
may predict the next valueand encode the error. In this model frequent symbols with short code-string length increases, and
the error and not the actual data is encoded. Typically, at the encoding the less-frequent events with long code length in-
encoder side of a compression system, the data to be com- creases. Let e, denote the occurrences of the ith symbol in a
pressedfeed a model unit. The model determines 1) the data string. For the memoryless model and a given code, let 4
event@) to be encoded, and 2) the estimate of the relative denote the length (in bits) of the code-string increase associated
0 Copyright 1984 byInternational Business Machines Corporation. Copying in printed form for private use is permitted without payment of
royalty provided that ( 1 ) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the
first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by
computer-based and other information-servicesystems. Permission to republish any other portion of this paper must be obtained from the Editor. 135
IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH I 984 GLEN G . LANGDON. JR.
Huffman
Table 1 Example code. Encoder
The encoder accepts the events to be encoded and generates
Symbol Codeword
Probability p Cumulative the code string.
(in
binary) probability P
IBM J . RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984 GLEN G. LANGDON, JR.
0 ,001 1001 Ill I retain the important techniqueof the double recursion. Con-
-
I 4
I
, d I a I h I (
I
I sider thearrangement of Table 2. The “codeword”corre-
Id1 a I h I c I sponds to the cumulative probability P of the preceding sym-
1 - I
Id1 a I b ICI bols in the ordering.
I...l
The subdivision of the unit interval for Table2, and for the
Figure 3 Subdivision of unit interval for arithmetic code of Table 2 data string “ a a b,” is shown in Figure 3. In this example, we
and data string“a a b. . ..” retain Points 1 and 2 of the previous example, but no longer
have the prefix property of Huffman codes. Compare Figs. 2
Table 2 Arithmeticcodeexample. and 3 to see that the interval widths are the same but the
locations of the intervals have been changed in Fig. 3 to
Symbol Cumulative Symbol Length conform with the new ordering in Table 2.
probability P probability p
d .ooo .oo1 3 Let usagaincode the string “a a b c.” This example
b .oo1 .010 2 reinforces thedouble recursionoperations, where the new
a .o I 1 .IO0 1 values become the current values for the next recursion. It is
C .111 .oo1 3
helpful to understand the arithmeticprovided here, using the
“picture” of Fig. 3 for motivation.
W of the current interval and the cumulative probability P, The first “a” symbol yields the code point .O 1 1 and interval
for the symbol i being encoded: [.O 1 1,.1 1 l), as follows:
New C = Current C + ( A X Pi). First symbol ( a )
For example, after encoding “a a,” the current code point C C N e w c o d e p o i n t C = O + I X(.011)=.011.
is 0 and theinterval width A is .O 1. For “ a a b,” the new code (Current code point plus current width A times P.)
point is .OO I , determined as0 (current code point C), plus the A : New interval width A = 1 X (.1) = . l .
product (.Ol) X (.loo). The factor on theleft is the width A of (Current width A times probability p . )
the current interval, and the factor on the right is the cumu- In the arithmeticcoding literature, we have called the value A
lativeprobability P for symbol “b”; see the“Cumulative X P added to the oldcode point C, the augend.
GLEN G. LANGDON, JR. IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984
To handle the fourthletter, “c,” we continue as follows. . I O 1001 I lies in [.O 1 1,. I IO), which is a’s subinterval. We can
summarize this step asfollows.
Fourth symbol (c)
C: New code point = .IO011 + .0001 X (.Ill) Step I : Decoder C comparison Examine the code string and
= .1010011. determine the interval in which it lies. Decode the symbol
.IO01 1 (current code
point) corresponding to thatinterval.
,0000I 1 1 (current width A times P, or augend)
Since the second subinterval code pointwas obtained at the
. I O 1001 1 (new code point) encoder by addingsomethingto .011, we can prepare to
A : New interval width A = .0001 X (.001) = .0000001. decode the second symbol by subtracting .011 from the code
(Current width A times probability p.) string: .IO1001 1 - .011 = .0100011. We then have Step 2.
Code-string termination Now we can decode the second symbol from the adjusted
Following encoding of “a ab c,” thecurrent interval is codestring .IO001 1 by dealing directly with the values in
[.1010011,.1010100). Ifwe were to terminate the code string Table 2 and repeating Decoder Steps 1, 2, and 3.
at this point (no more data symbols to handle), any value
equal to or greater than ,101001 1, but less than .1010100, Decoder Step 1 Table 2 identifies “a” as the second data
would serve to identify the interval. symbol, because the adjusted code string is greater than .01 I
(codeword for “a”)but less than . I 1 I (codeword for “c”).
Let us overview the example. In our creation of code string
Decoder Step 2 Repeating the operation of subtracting .O I I ,
.1010011, we in effect added properly scaled cumulative prob-
we obtain
abilities P, called augends, to the code string. For the width
recursion on A , the interval widths are, fortuitously, negative .100011 - .01I = .001011.
integral powers of two, which can be represented as floating Decoder Step 3 Symbol “a” causesmultiplication by .I at
point numbers with one bit of precision. Multiplication by a the encoder, so the rescaled code string is obtained by doubling
negative integral power of two may be performed by a shift the result of Decoder Step 2:
right. The code string for“a a bc” is the result of the following
sum of augends, which displays the scaling by a right shift: .01011.
A key advance of arithmetic coding was to contain the The BAC algorithm may be used for encoding any set of
required precision so that the significant digits of the multi- events,whatever the original form, by breaking the events
plications do not grow witb the code string. We can describe down for encoding into a succession of binary events. The
the use of a fixed number of significant bits in the setting of a BAC accepts this succession of events and delivers successive
140 code-string tree. Moreover, constrained channel codes [ 11 as bits of the code string.
IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984 GLEN G. LANGDON, JR.
Table 3 Exampleencoding-refining the interval.
Initial - - 0 - 0.o000oo 1. m 0
1 T 3 0 .001 0.001o00 0.1 1 1000
2 T 1 1 .o1 0.01 lo00 0.101o00
3 F 1 1 .o 1 0.01 lo00 0.01oooo
4 T 1 2 .oo1 0 . 1 m 0.001o00
c-c
0 ,001 1 For Event I , SK is3 and E is 0. For Step 1, the width
r I 1
Y
1
1
/
associated with the value F, W(F), is 2” or 0.001. W(T) is
F T what is left over, or 1.000 - 0.001 = 0. I 1 1. See Figure 5.
S u b d ~ v i s ~ opoint
n Relative to Step2, Eq. (3), the subdivision point is C W(T) +
+
or 0 .OO 1 = .001. Since the binaryvalue is T and therelative
Figure 5 Interval splitting-subdivision for Event 1, Table 3.
frequency of the T event is equal to orgreater than 4,we keep
the larger (rightmost) subinterval. Refemng to Fig. 5 , we see
that the new values of C and A which describe the interval are
0 Oll .I
Width ,101
101
h
. I
now C(T) = 0.001 and A(T) = W(T) = 0.11 1. For Step 3, we
Y- note that A has developed a leading 0, so E = 1.
I
L
I I
1
(a)
t Subdlvlslun polnt For Event 2 of Table 3, the skew SK is 1 and E is now I ,
so W(F)i~2-(~+~)or0.01. W(T)isthus0.111 -0.010=0.101.
0
I
-
,011
L
Width 010
IO1
\
J
I
J
The subdivision point of the current interval is C + W(F), or
0.01 1. Again, the event value is T, so we keep the rightmost
part. The new value of C is the subdivision point 0.0 1 1, and
the new value ofA is W(T) or 0,101. Theleading 1-bit position
(b) of A has not changed, so E is still 1.
Figure 6 Intervalsplitting-subdivision for Event 3, Table 3: (a)
Current interval at endof Event 2 and subdivision point. (b) Current For Event 3 of Table 3, see Figure 6, which displays current
interval following encoding of Event3. interval [.011,1) ofwidth .101. The smaller width W(F) is
2-(’+’) or .01. We add this value to C to obtain
C + W(F), or subdivision point .101. See Fig. 6(a). Refemng
Step I Given skew SK and E (the leading Os of A ) , subdivide
to Event 3 of Table 3, the value to encode is F, so we must
the current width as in Eqs. ( I ) and (2).
now keep the left side of the subdivision. By keeping the F
Step 2 Given the event values (T or F), C, W(T), and W(F), subinterval, the value of C remains at .01I and A becomes
describe the new interval: W(F) or 0.01. Available width A has a new leading 0, so E
becomes 2. The resulting interval is shown in Fig. 6(b).
If T: C(s,T) = C(s) + W(F) and A(s,T) = W(T). (34
If F: C(s,F) = C(s) and A(s,F) = W(F). (3b) Separation of data handlingfrom the arithmetic
Arithmetic codes generate the code string by adding a sum-
Step 3 Given the new value of A , determine the new value
mand (called augend in the arithmetic coding literature) to
of E
the current code string and possibly shifting the result. The
If T: If A(s,T) < 2-€(”),then E(s,T) = E($) + 1; summation operation creates a problem called the carry-over
problem. We can, in thecourse of code generation, shift out a
otherwise E(s,T) = E($).
long string of 1s from the coding process. An addition could
If F: E(s,F) = E(s)+ SK. propagateacarry into the longstring of Is, changing the
values of higher-order bits until a 0 is converted to a 1 and
We continue the discussion by an example, where we en- stopping thecurry chain.
code the four-event string T, T, F, T under respective skews
3, I , 1, 1. Theencoding is described by Table 3, andthe In this section we show how the arithmetic using A and C
142 following description accompanies thistable. can be separated fromthe carry-overhandling anddata-
GLEN G.LANGDON. JR. IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984
Decoding process START
Encoding
Let the n-symbol alphabet, whatever it may be, have an
A = I 000 ordering. Our interest is in the symbol positionof the ordering:
(b) 1, 2, . . ., k, . . ., n. Let relative frequencies p l ,p2, . . ., pn,and
interval width A ( s ) be given. The interval is subdivided ac-
Figure 11 Code-string tree for Event 3, Table 4: (a) Following Event cording to relative frequencies, so the respective widths are
2. (b) Following Event 3.
p I X A(s),pz X A(s), . . ., pn X A(s). Recall the calculation of
cumulative probabilities P :
tree leaves which describe the code space is from eight leaves Pk = pJ for k 2 2, and PI = 0.
Jck
SK of 2 correspondsto two
to 14 leaves. Also at this depth, an
leaves and an SK of 1 corresponds to four leaves. The initial For encoding the eventwhose order is k, following the encod-
code-string tree has eight leaves (A = 1.000) and a depth of ing ofpriorstring s, the new subinterval [C(s,k),C(s,k) +
three. See Fig. 10(a) for the subdivision point for Event I of A(s,k)) is described with the following two equations of the
Table 4. Event I is T, so we keep the right subset of seven familiar arithmetic coding double recursion:
leaves. For subsequentencoding, the codestring will be a C(s,k)= C(s) + D(s,k),where D(s,k) = A ( s ) x Pk, (7)
continuation of 001, 0 1, or 1, and we clearly are not dealing
with a prefix code. A(s,k)= A ( s ) X P k . (8)
GLEN G. LANGDON. JR. IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984
In Rissanen [8] andPasco [ 5 ] , theoriginal (given, or pre- 10. Frank Rubin, “Arithmetic Stream Coding UsingFixed Precision
sumed) symbol probabilities were used. (In practice, weuse Registers,” IEEE Trans. Info. Theory lT-25,672-675 (November
1979).
estimates of the relative frequencies. However, the notion of 11. C. Shannon, “A Mathematical Theory of Communication,” Bell
an imaginary “source” emitting symbols according to given Syst. Tech. J. 27, 379-423 (July 1948).
probabilities is commonly found in the coding literature.) In 12. Mauro Guazzo,“A General Minimum-RedundancySource-Cod-
ing Algorithm,” IEEE Trans. Info. Theory IT-26, 15-25 (January
[20] and [3], Rissanen and Langdon introduced the notion of 1980).
coding parameters “based” on the symbol probabilities. The 13. StephenJ. P. Todd,GlenG.Langdon, Jr., and G. Nigel N.
uncoupling of the coding parameters from the symbol proba- Martin, “A General Fixed Rate Arithmetic Coding Method for
ConstrainedChannels,” IBM J. Res. Develop. 27,107-1 15
bilities simplifies theimplementation of the code at very little (1983).
compression loss, and gives the code designer some tradeoff 14. J. J. Rissanen and G. Langdon, “Universal Modeling and Cod-
possibilities. In [20] it was stated that there were ways
to block ing,” IEEE Trans. Info. Theory IT-27, 12-23 (January 1981).
15. D. Anastassiou, M. K. Brown, H. C. Jones, J. L. Mitchell, W. B.
the carry-over, and in [3] bit-stuffing was presented.In [ 101 F. Pennebaker, andK. S. Pennington, “Series/l-Based Videoconfer-
Rubin also improved Pasco’s code by preventing carry-overs. encing System,”IBM Syst. J. 22,97-I 10 (1983).
The result was called a “stream” code. Jones [7] and Martin 16. N. Abramson, Information Theory and Coding, McGraw-Hill
Book Co., Inc., New York, 1963.
[6] have independently discovered P-based FIFO arithmetic 17. F. Jelinek, Probabilistic Information Theory, McGraw-Hill Book
codes. Co., Inc., New York, 1968.
18. J. Schalkwijk, “An Algorithm for Source Coding,” IEEE Trans.
Info. Theory IT-18, 395 (1972).
RissanenandLangdon [20] successfullygeneralizedand 19. T. M. Cover, “Enumerative Source Coding,” IEEE Trans. Info.
characterized the family of arithmetic codes through the no- Theory IT-19, 73 (1973).
tion of thedecodabilitycriterionwhichapplies to allsuch 20. J. J. Rissanen and G. G. Langdon, Jr., “Arithmetic Coding,” IBM
J. Res. Develop. 23, 149-162 (1979).
codes, be they LIFO or FIFO, L-based or P-based. The arith- 21. E. N. Gilbert and E. F. Moore, “Variable-Length Binary Encod-
metic coding family is seen to be a practical generalization of ings,” Bell Syst. Tech. J. 38, 933 (1959).
many pre-arithmeticcoding algorithms, including Elias’code, 22. J. Rissanen, “Arithmetic Coding as Number Representations,”
Acta Polyt. Scandinavica Math. 34,44-5 I (December 1979).
Schalkwijk [ 181, andCover [ 191. GilbertandMoore [21]
devised the prefix coding approach used in Table 1. In [22],
Rissanen presents an interesting viewof an arithmeticcode as Received June 29, 1983; revised October 8, 1983
a number-representation system, and shows that Elias’ code
and enumerative codes are duals.
Glen G. Langdon, Jr. IBM Research Division, 5600 Cottle
Road, San Jose, California 95193.Dr.Langdon received the B.S.
References from Washington State University, Pullman, in 1957, the M.S. from
I . G. Nigel N.Martin,Glen G. Langdon,Jr.,andStephenJ. P. the University of Pittsburgh, Pennsylvania, in 1963, and the Ph.D.
Todd,“Arithmetic Codes for ConstrainedChannels,” IBM J. from Syracuse University, New York, in 1968, all in electrical engi-
Res. Develop. 27, 94-106 (March 1983). neering. He worked for Westinghouse on instrumentation and data
2. G. G. Langdon, Jr. and J. Rissanen, “A Simple General Binary logging from 196 I to 1962 and was an application programmer for
Source Code,” IEEE Trans. Info. Theory IT-28, 800-803 (Sep the PRODAC computer forprocess control for most of1963. In 1963
tember 1982). he joined IBM at the Endicott, New York, development laboratory,
3. Glen G. Langdon,Jr.andJormaRissanen,“Compression of where he did logic design on small computers. In 1965 he received an
Black-White Images with Arithmetic Coding,’’ IEEE Trans. Com- IBM Resident Study Fellowship. On his return from Syracuse Uni-
mun. COM-29,858-867 (June 1981). versity, he was involved in future system architectures and storage
4. S. W. Golomb, “Run-Length Encoding,” IEEE Trans. Info. The- subsystem design. During 1971, 1972, andpart of 1973, he was a
ory IT-12, 399-401 (July 1966). Visiting Professor at the University of Sao Paulo, Brazil, where he
5. R. Pasco, “Source Coding Algorithms for Fast Data Compres- developed graduate courses on computer design, design automation,
sion,” Ph.D. Thesis, Department of Electrical Engineering, Stan- microprogramming,operating systems, andMOS technology. The
ford University, CA, 1976. first Brazilian computer, called Patinho Feio (Ugly Duckling), was
6. G. N. N. Martin, “Range Encoding: an Algorithm for Removing developed by the students at the University of Sao Paulo during his
Redundancy from a Digitized Message,’’ presented at the Video stay. He is author of Logic Design: A Revien of Theory and Practice,
andDataRecordingConference,Southampton,England,July an ACM monograph, and coauthor of the Brazilian text Project0 de
1979. Sistemas Digitais; he has recently published Computer Design. He
7. C. B. Jones, “An Efficient Coding System for Long Source Se- joined the IBM Research laboratory in 1974 to work on distributed
quences,” IEEE Trans. Info. TheoryIT-27,280-291 (May 1981). systems and later on stand-alone color graphicsystems. He has taught
8. J.J.Rissanen,“GeneralizedKraftInequalityandArithmetic graduate courses on logic and computer design at the University of
Coding,” IBMJ. Res. Develop. 20, 198-203 (1976). Santa Clara, California.He is currently working in data compression.
9. Glen G. Langdon, Jr. and Jorma J. Rissanen, “A Double Adaptive Dr. Langdon received an IBM Outstanding Innovation Award for his
File Compression Algorithm,” IEEE Trans. Commun. COM-31, contributions to arithmetic coding compression techniques.He holds
12.53-1255 (1983). eight patents.
149
IBM J . RES I3EVELOP. VOL. 28 NO. 2 MARCH I 984 GLEN G LANGDON. JR.